DeepMC_4 / Decoder

Introduction

E) Attention Mechanism과 연결되어있는 F) Decoder 부분을 살펴본다.

Decoder의 output이 Attention의 input으로 사용되기 때문에 맨 처음에 Decoder LSTM을 어떻게 구현해야할지 고민을 많이 했다.

일반적인 nn.LSTM을 사용하면 time step별로 output을 뽑을 수 없기 때문이다.

Method & Material

E) Attention Mechanism은 2 level attention (Position Based Content Attention Layer, Scale Guided Attention Layer)로 이루어져 있고, 각각 Context vector c, c`을 time step 별로 출력해준다.

그리고 Decoder에서는 이전 timestep의 output(m_{i-1}), LSTM의 이전 timestep의 hidden state(s_{i-1}), 현재 timestep의 context vector(c_i)를 입력한다.

자세한 설명은 원문을 인용한다.

The LSTM decoder (described in Section 6.6) parallels the encoder by associating each output 𝑚_𝑖 , 1 ≤ 𝑖 ≤ 𝑇 ′ to a hidden state vector 𝑠_𝑖 that is directly used to predict the output:
𝑚𝑖 = 𝐺(𝑚_{𝑖−1}, 𝑠_{𝑖−1}, 𝑐_𝑖),
with 𝑠_𝑖 ∈ R^𝐻 ′ , 𝐻 ′ is the dimension of the decoder hidden layer, 𝑐_𝑖 is usually referred to as a context and corresponds to the output of the memory model. For DeepMC function G corresponds to an LSTM with a context integration

그림과 수식을 보면 알 수 있지만, 중요한 것은 Decoder LSTM의 hidden state인 s_i와 output(m_i)가 attention layer의 input으로 들어간다는 점이다.

이부분을 구현하기 위해 LSTMCell을 이용해서 time step 별로 s_i와 context vector를 계산하게끔 코드를 작성했다.

LSTM 이후에는 2개의 FC Layer가 연결되어있고, 구조는 supplementary에 다음과 같이 적혀있다.

The decoder described in Section 6.6 uses a 20 node LSTM layer with the ReLU activation function. Additionally, the decoder also uses ReLU activation for the first dense layer and a linear activation function for the second dense layer. The first dense layer has 50 nodes for each of the time series steps and the second dense layer has 1 node for each of the time series steps.

LSTM layer의 hidden node는 20개, 첫번째 FC layer는 50개, 두번째 FC layer는 1개의 output node를 가진다.

즉 첫번째 FC layer는 50x20의 weight를 가지고, 두번째 FC layer는 1x50의 weight를 가진다.

Results

맨 처음에 이 아이디어가 이해가 되지 않아서 어떻게 구현해야하나 한참 고민했지만, 논문을 계속읽다보니 생각보다 Layer구조가 복잡하지는 않다는 것을 느끼고 금방 코드를 구현할 수 있었다.

코드자체가 어렵진 않지만 혹시 헷갈릴 수 있기 때문에 변수에 대한 설명을 주석에 충분히 달았다.

from torch import nn

class Decoder(nn.Module):
    def __init__(self,num_encoder_hidden : int , num_decoder_hidden : int, cnn_output_size : int):
        
        super().__init__()
        
        # Hyper parameter
        self.num_encoder_hidden = num_encoder_hidden
        self.num_decoder_hidden = num_decoder_hidden
        self.cnn_output_size = cnn_output_size
        
        # m_i = G(m_i-1, s_i-1, c_i)
        # m_i-1 / (1)
        # s_i-1 / (num_decoder_hidden)
        # c_i   / (num_of_CNN_stacks + num_encoder_hidden * 2) 
        self.Decoder = nn.LSTMCell(1+self.cnn_output_size+self.num_encoder_hidden * 2, self.num_decoder_hidden)

        # FC Layer for hidden state to output
        # 2 layers / each layer has 50, 1 output dimension
        self.FC_layer = nn.Sequential(
            nn.Linear(self.num_decoder_hidden, 50),
            nn.ReLU(),
            nn.Linear(50, 1)
        )
    
    def forward(self, decoder_input, s_i, cell_state):
        # 1 <= j <= T  , num_encoder_times T is lstmstack seq length, in this case T = 18
        # 1 <= i <= T' , num_decoder_times T' is lstm decoder seq length, in this case T' = 12
    
        # decoder_input / input of decoder / (m_i-1, (s_i-1, cell_state), (c_i+c_prime_i))
        # m_i / output of decoder / (batch size, 1)
        # s_i-1 / hidden state of Decoder LSTM / (batch size, decoder hidden size)
        # cell_state / cell state of Decoder LSTM / (batch size, decoder hidden size)
        # c_i / (batch size, 1, encoder hidden size)
        # c_prime_i / (batch size, 1, output size of CNN stack)
        s_i, cell_state = self.Decoder(decoder_input,(s_i,cell_state))

        # m_i / output of decoder / (batch size, 1)
        m_i = self.FC_layer(s_i)
        
        return m_i,(s_i,cell_state)

Conclusion & Discussion

Decoder를 마지막으로 모델 구현 자체는 끝났다.

논문을 낸 MS에서 코드를 제공해주지 않아서 힘들어질뻔 했지만, 덕분에 attention도 구현해보고, 이렇게 큰 모델도 처음부터 만들어보는 등, 여러가지로 성장할 수 있던 프로젝트인 것 같다.

trainer를 만들고 학습하는 과정에서 발생한 여러가지 문제나 삽질은 블로그에 따로 정리중이다.

DeepMC 모델에 대한 전체 코드는 다음 github에 있다.

https://github.com/wlsdml1114/DeepMC

'인공지능 > Toy Projects' 카테고리의 다른 글

DeepMC_3 / 2 Levels Attention Mechanism (0)	2022.06.16
DeepMC_2 / Multi-Scale Deep Learning (0)	2022.06.15
DeepMC_1 / Wavelet Packet Decomposition (WPD) (0)	2022.05.23
Multi-GPU Deep Learning_2 (using DDP, MaskRCNN) (0)	2022.05.17
Multi-GPU Deep Learning_1 (using DDP) (0)	2022.05.12

은긔 노트

DeepMC_4 / Decoder

Introduction

Method & Material

Results

Conclusion & Discussion

'인공지능 > Toy Projects' 카테고리의 다른 글

티스토리툴바

DeepMC_4 / Decoder

Introduction

Method & Material

Results

Conclusion & Discussion

'인공지능 > Toy Projects' 카테고리의 다른 글

'인공지능/Toy Projects' Related Articles

티스토리툴바