Data Parallelism

In data parallelism, each GPU contains a full copy of the model.

Data is split across GPUs, and inference is performed on GPUs in parallel. This parallelism provides inference speed up.

Last updated on Jul 8, 2025