목차
Problem
DeepMC 모델을 구현하던 중, Layer List를 선언해야할 일이 생겼다.
WPD를 이용해 다중 스케일로 분류한 입력값으로(실험에서는 7개 스케일) 동일한 구조를 가진 CNN stack를 학습해야하는 상황이다.
그래서 CNN stack을 다음같이 선언했다.
self.CNNstacks = [CNNstack(self.num_encoder_feature)
for _ in range(self.num_of_CNN_stacks)]
그리고 모델 fit을 진행했더니 뱉는 에러가 RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same 에러였다.
에러 전문은 다음과 같다.
File "trainer.py", line 81, in <module>
trainer.fit(deepmc, datamodule=dl)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit
self._run(model)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
self.dispatch()
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
self.accelerator.start_training(self)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
return self.run_train()
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 844, in run_train
self.run_sanity_check(self.lightning_module)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in run_sanity_check
self.run_evaluation()
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 967, in run_evaluation
output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 174, in evaluation_step
output = self.trainer.accelerator.validation_step(args)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in validation_step
return self.training_type_plugin.validation_step(*args)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in validation_step
return self.lightning_module.validation_step(*args, **kwargs)
File "/home/ubuntu/jini1114/DeepMC/net/deepmc.py", line 150, in validation_step
y_hat = self([X,U])
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/jini1114/DeepMC/net/deepmc.py", line 101, in forward
CNNs = [
File "/home/ubuntu/jini1114/DeepMC/net/deepmc.py", line 102, in <listcomp>
self.CNNstacks[i](torch.cat((X[:,self.X_levels[i],:,:],U[:,self.U_levels[i],:,:]),1)) for i in range(self.num_of_CNN_stacks)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/jini1114/DeepMC/net/encoder.py", line 68, in forward
return self.sequence(WPD)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 263, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 259, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
에러 전문을 확인하면 알 수 있듯이, CNN stack forward 과정에서 에러가 발생했다.
Input type은 cuda인데 weight type이 cuda가 아니라서 발생한 문제였다.
그래서 .cuda()를 붙인 CNN stack을 미리 만들고 deepcopy도 해보고, list comprehension할 때 .cuda()도 붙여보고 별 짓을 다해봤다.
그리고 이렇게 선언한 layer는 summary에서 인식하지 못했다.
분명히 CNNstack과 Scaled_guided_attention에서 layer list를 사용해서 선언을 했을텐데, parameter로 잡히지 않았다.
Solution
해답은 의외로 간단했다.
layer를 list로 선언하는 것 자체는 문제가 없었지만, 한가지 단계를 추가해야 했다.
ModulList를 이용해서 layer list를 감싸줘야 했다.
self.CNNstacks = [CNNstack(self.num_encoder_feature)
for _ in range(self.num_of_CNN_stacks)]
self.CNNstacks = torch.nn.ModuleList(self.CNNstacks)
이렇게하면 에러도 사라지고 parameter도 정상적으로 인식되는 것을 볼 수 있다.
위에서는 없었던 CNNstack과 Scaled_guided_attention에 parameter가 잡히는 것을 확인할 수 있다.