Keras 함수형 API로 시작하기

Keras의 함수형 API(functional API)는 다중 출력 모델, 방향성 비순환 그래프(directed acyclic graph), 공유 계층(shared layer)를 가지는 모델 등과 같은 복잡한 모델을 정의하는 방법입니다.

이 가이드는 사용자가 이미 Sequential 모델에 익숙하다고 가정합니다.

간단한 것부터 시작해 봅시다.

첫 번째 예제: 연결 밀도가 높은 네트워크 (densely-connected network)

이 네트워크를 구현할 때 Sequential 모델을 사용하는 것이 더 좋지만, 아주 간단한 것부터 시작해보는 것이 이해에 도움이 됩니다.

계층 인스턴스는 (텐서에서) 호출할 수 있으며 텐서를 반환합니다.
입력 텐서와 출력 텐서를 사용하여 Model을 정의 할 수 있습니다.
이렇게 만든 모델은 Keras의 Sequential 모델과 마찬가지로 훈련에 사용할 수 있습니다.

Notice

<aside> 💡 CodeOnWeb에서는 따로 파이썬을 설치하지 않더라도 코드박스 아래의 실행 버튼을 클릭하여 바로 파이썬 코드를 실행해볼 수 있습니다. TensorFlow, Theano와 같은 백엔드 코드도 마찬가지로 설치 없이 실행 버튼만 눌러 실행할 수 있습니다. 'Keras 연습하기' 과정에서는 실행 결과까지 출력하는 예제 코드에 대해서만 실행하여 결과를 확인할 수 있도록 하였습니다.

</aside>

from keras.layers import Input, Dense
from keras.models import Model

# This returns a tensor
inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(data, labels)  # starts training

계층과 마찬가지로, 모든 모델은 호출 가능합니다

함수형 API를 사용하면 훈련된 모델을 쉽게 재사용할 수 있습니다. 모든 모델은 텐서에서 호출하여 마치 계층처럼 취급할 수 있습니다. 모델을 호출하면 모델의 아키텍처를 재사용할 뿐만 아니라 가중치도 재사용한다는 점에 주목하십시오.

x = Input(shape=(784,))
# This works, and returns the 10-way softmax we defined above.
y = model(x)

예를 들어, 일련의 입력을 처리 할 수있는 모델을 신속하게 작성할 수 있습니다. 한 줄만 바꾸면 이미지 분류 모델을 비디오 분류 모델로 바꿀 수 있습니다.

from keras.layers import TimeDistributed

# Input tensor for sequences of 20 timesteps,
# each containing a 784-dimensional vector
input_sequences = Input(shape=(20, 784))

# This applies our previous model to every timestep in the input sequences.
# the output of the previous model was a 10-way softmax,
# so the output of the layer below will be a sequence of 20 vectors of size 10.
processed_sequences = TimeDistributed(model)(input_sequences)

다중 입력 및 다중 출력 모델

다중 입력과 출력을 가지는 모델은 함수형 API를 이용하는 좋은 사례입니다. 함수형 API를 사용하면 많은 수의 서로 얽힌 데이터 스트림을 쉽게 조작 할 수 있습니다.

다음과 같은 모델을 고려해 봅시다. 우리는 트위터에서 뉴스 헤드라인이 얼마나 많은 리트윗과 좋아요를 받을지 예측하고자 합니다. 모델의 주된 입력은 헤드라인을 구성하는 단어가 되겠지만, 좀더 좋은 결과를 위해 헤드라인이 작성된 시간 등과 같은 데이터도 보조 입력으로 줄 수 있습니다. 이 모델은 두 종류의 손실함수(loss functions)를 통해 관리됩니다. 모델 초기에 주된 손실함수를 두는 것은 딥모델을 위한 좋은 정규화 작업입니다.

우리 모델은 다음과 같이 생겼습니다.

이 모델을 함수형 API로 구현해보겠습니다.

주 입력 계층은 헤드라인을 받아들이는데, 이때 헤드라인은 연속된 정수(각 정수는 단어를 인코딩한 것) 형태가 됩니다(문자로 된 헤드라인을 그대로 처리할 수 없으므로 숫자로 바꾸어 입력합니다). 정수는 1에서 10,000 사이이며 (10,000단어로 된 단어집(vocabulary)) 연속된 정수의 길이는 100단어가 됩니다.

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# Note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')

# This embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)

# A LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)

여기서 보조 손실함수를 추가하여 모델에서 주 손실이 훨씬 크더라도 LSTM과 임베딩(embedding) 계층이 매끄럽게 훈련되도록 합니다.

auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)

이 시점에서 우리는 보조 입력 데이터를 LSTM 출력과 상호연결(concatenate)하여 모델에 집어넣습니다:

auxiliary_input = Input(shape=(5,), name='aux_input')
x = keras.layers.concatenate([lstm_out, auxiliary_input])

# We stack a deep densely-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

# And finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

다음과 같이 모델이 두 개의 입력과 두 개의 출력을 가지도록 정의합니다.

model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])

모델을 컴파일하면서 0.2의 가중치를 보조 손실에 할당합니다. 각 출력에 서로 다른 loss_weights 또는 loss를 지정하려면 리스트나 딕셔너리를 사용할 수 있습니다. 여기서는 loss 인수를 통해 하나의 손실을 전달하므로, 모든 출력에서 같은 손실이 사용됩니다.

model.compile(optimizer='rmsprop', loss='binary_crossentropy',
              loss_weights=[1., 0.2])

입력 배열과 목표 배열의 리스트를 전달하여 모델의 훈련을 진행합니다:

model.fit([headline_data, additional_data], [labels, labels],
          epochs=50, batch_size=32)

("name" 인수를 이용해) 입력과 출력에 이름을 붙였으므로 우리는 다음과 같이 모델을 컴파일할 수도 있습니다:

model.compile(optimizer='rmsprop',
              loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1., 'aux_output': 0.2})

# And trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
          {'main_output': labels, 'aux_output': labels},
          epochs=50, batch_size=32)

공유 계층

공유 계층은 함수형 API를 적용할 수 있는 또 다른 좋은 모델입니다. 공유 계층에 관해 살펴보겠습니다.

트위터의 데이터셋을 생각해 봅시다. 우리는 두 개의 트윗이 같은 사람이 작성한 것인지 아닌지 알 수 있는 모델을 만들려고 합니다. (예를 들어, 트윗 간의 유사도를 통해 사용자를 비교해볼 수 있습니다)

생각해 볼 수 있는 한 가지 방법은, 두 트윗을 두 개의 벡터로 인코딩하고 벡터를 상호연결(concatenate)한 후 로지스틱 회귀에 넣어 두 트윗이 같은 사용자에게서 나왔을 확률을 출력하는 모델을 만드는 것입니다. 그리고 이 모델을 긍정적인 트윗 쌍과 부정적인 트윗 쌍 데이터를 이용해서 훈련시킵니다.

문제에 대칭성이 있으므로, 첫 번째 트윗을 인코딩하는 방법(가중치 등)은 두 번째 트윗을 인코딩하는 데 다시 사용해야 합니다. 여기서는 공유 LSTM 계층을 사용하여 트윗을 인코딩해보겠습니다.

함수형 API로 구현해봅시다. (140, 256) 모양의 이진 행렬을 이용해서 트윗을 입력하도록 합니다. (여기서 256차원 벡터의 각 차원은 256개의 자주 사용되는 알파벳 문자의 존재/부재 정보를 인코딩합니다).

import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model

tweet_a = Input(shape=(140, 256))
tweet_b = Input(shape=(140, 256))

다양한 입력에 대해 계층을 공유하려면 계층을 한 번만 인스턴스화한 다음 이를 여러 입력에 대해 호출하면 됩니다:

# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)

# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)

# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)

# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)

# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)

잠시 멈추고 공유 계층의 출력 또는 출력 모양을 자세히 한 번 살펴봅시다.

계층 "노드"의 개념

어떤 입력에 한 계층을 적용할 때마다 새로운 텐서(계층의 출력)가 만들어지고, 계층에는 "노드"가 추가되어 입력 텐서와 출력 텐서를 연결하게 됩니다. 같은 계층을 여러 번 호출하면 해당 계층은 0, 1, 2… 로 인덱싱 된 여러 노드를 가집니다.