목차
Keywords
Data Parallel(DP), Distributed Data Parallel (DDP), Multi-GPU training, All reduce
Data Parallel(DP), Distributed Data Parallel(DDP)의 차이
Compared to DataParallel, DistributedDataParallel requires one more step to set up, i.e., calling init_process_group. DDP uses multi-process parallelism, and hence there is no GIL contention across model replicas. Moreover, the model is broadcast at DDP construction time instead of in every forward pass, which also helps to speed up training. DDP is shipped with several performance optimization technologies.
- https://pytorch.org/tutorials/beginner/dist_overview.html
python의 GIL overhead 때문에 multi-thread가 아닌 multi-process를 쓰는게 효율적이고, 때문에 DP 보다는 DDP를 사용하는것이 효율적이며 효과가 있음
간단하게 DP는 "single-process" multi-thread, DDP는 "multi-process"
물론 DP보다 DDP가 구현이나 적용에 있어서 좀 더 까다로운 것은 사실
'인공지능 > Python' 카테고리의 다른 글
AttributeError: 'YourDataModule' object has no attribute '_has_prepared_data' 해결방법 (0) | 2022.05.13 |
---|---|
LightningDataModule을 for loop에서 이용하는 방법 (0) | 2022.05.13 |
Warning: find_unused_parameters=True 해결방법 (DDP programming) (0) | 2022.05.13 |
Tensorflow/Keras Out of memory 해결 (0) | 2021.07.14 |
python mask to polygon, reducing points in polygon (0) | 2021.07.07 |