Data Parallel(DP), Distributed Data Parallel(DDP)의 차이
목차 Keywords Data Parallel(DP), Distributed Data Parallel (DDP), Multi-GPU training, All reduce Data Parallel(DP), Distributed Data Parallel(DDP)의 차이 Compared to DataParallel, DistributedDataParallel requires one more step to set up, i.e., calling init_process_group. DDP uses multi-process parallelism, and hence there is no GIL contention across model replicas. Moreover, the model is broadcast at..