본문 바로가기

인공지능/Python

Data Parallel(DP), Distributed Data Parallel(DDP)의 차이

목차

    Keywords

    Data Parallel(DP), Distributed Data Parallel (DDP), Multi-GPU training, All reduce

     

    Data Parallel(DP), Distributed Data Parallel(DDP)의 차이

    Compared to DataParallel, DistributedDataParallel requires one more step to set up, i.e., calling init_process_group. DDP uses multi-process parallelism, and hence there is no GIL contention across model replicas. Moreover, the model is broadcast at DDP construction time instead of in every forward pass, which also helps to speed up training. DDP is shipped with several performance optimization technologies.
    - https://pytorch.org/tutorials/beginner/dist_overview.html

    python의 GIL overhead 때문에 multi-thread가 아닌 multi-process를 쓰는게 효율적이고, 때문에 DP 보다는 DDP를 사용하는것이 효율적이며 효과가 있음

    간단하게 DP는 "single-process" multi-thread, DDP는 "multi-process"

    물론 DP보다 DDP가 구현이나 적용에 있어서 좀 더 까다로운 것은 사실