본문 바로가기

인공지능/Python

Layer List로 이루어진 모델 RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same 오류 해결방법

목차

    Problem

    DeepMC 모델을 구현하던 중, Layer List를 선언해야할 일이 생겼다.

    WPD를 이용해 다중 스케일로 분류한 입력값으로(실험에서는 7개 스케일) 동일한 구조를 가진 CNN stack를 학습해야하는 상황이다.

    그래서 CNN stack을 다음같이 선언했다.

    self.CNNstacks = [CNNstack(self.num_encoder_feature) 
                      for _ in range(self.num_of_CNN_stacks)]

    그리고 모델 fit을 진행했더니 뱉는 에러가 RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same 에러였다.

    에러 전문은 다음과 같다.

      File "trainer.py", line 81, in <module>
        trainer.fit(deepmc, datamodule=dl)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit
        self._run(model)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
        self.dispatch()
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
        self.accelerator.start_training(self)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
        self.training_type_plugin.start_training(trainer)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
        self._results = trainer.run_stage()
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
        return self.run_train()
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 844, in run_train
        self.run_sanity_check(self.lightning_module)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in run_sanity_check
        self.run_evaluation()
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 967, in run_evaluation
        output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 174, in evaluation_step
        output = self.trainer.accelerator.validation_step(args)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in validation_step
        return self.training_type_plugin.validation_step(*args)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in validation_step
        return self.lightning_module.validation_step(*args, **kwargs)
      File "/home/ubuntu/jini1114/DeepMC/net/deepmc.py", line 150, in validation_step
        y_hat = self([X,U])
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/ubuntu/jini1114/DeepMC/net/deepmc.py", line 101, in forward
        CNNs = [
      File "/home/ubuntu/jini1114/DeepMC/net/deepmc.py", line 102, in <listcomp>
        self.CNNstacks[i](torch.cat((X[:,self.X_levels[i],:,:],U[:,self.U_levels[i],:,:]),1)) for i in range(self.num_of_CNN_stacks)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/ubuntu/jini1114/DeepMC/net/encoder.py", line 68, in forward
        return self.sequence(WPD)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/container.py", line 119, in forward
        input = module(input)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 263, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 259, in _conv_forward
        return F.conv1d(input, weight, bias, self.stride,
    RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

    에러 전문을 확인하면 알 수 있듯이, CNN stack forward 과정에서 에러가 발생했다.

    Input type은 cuda인데 weight type이 cuda가 아니라서 발생한 문제였다.

    그래서 .cuda()를 붙인 CNN stack을 미리 만들고 deepcopy도 해보고, list comprehension할 때 .cuda()도 붙여보고 별 짓을 다해봤다.

    그리고 이렇게 선언한 layer는 summary에서 인식하지 못했다.

    분명히 CNNstack과 Scaled_guided_attention에서 layer list를 사용해서 선언을 했을텐데, parameter로 잡히지 않았다.

     

    Solution

    해답은 의외로 간단했다.

    layer를 list로 선언하는 것 자체는 문제가 없었지만, 한가지 단계를 추가해야 했다.

    ModulList를 이용해서 layer list를 감싸줘야 했다.

    self.CNNstacks = [CNNstack(self.num_encoder_feature) 
                        for _ in range(self.num_of_CNN_stacks)]
    self.CNNstacks = torch.nn.ModuleList(self.CNNstacks)

    이렇게하면 에러도 사라지고 parameter도 정상적으로 인식되는 것을 볼 수 있다.

    위에서는 없었던 CNNstack과 Scaled_guided_attention에 parameter가 잡히는 것을 확인할 수 있다.

     

    reference : https://discuss.pytorch.org/t/runtimeerror-tensor-for-out-is-on-cpu-tensor-for-argument-1-self-is-on-cpu-but-expected-them-to-be-on-gpu-while-checking-arguments-for-addmm/105453/7

     

    RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checkin

    Hi, I found the solution to this problem. I forgot to use ModuleList in class defining Residual Block. When I added it, the code ran perfectly. Here’s the modified code: # Residual Block class DenseResidual(torch.nn.Module): def __init__(self, inp_dim, n

    discuss.pytorch.org