Pipeline Parallelism (Inter-Layer)

In inter-layer pipeline parallelism, each GPU contains a different model stage (1+ layers).

GPUs compute on batches of data in parallel but must wait for previous stage to complete.

The advantages of inter-layer pipeline parallelism are that it reduces per-GPU memory use and improves inference throughput. The disadvantage of inter-layer pipeline parallelism is that it adds inference latency.

Source: https://colossalai.org/docs/concepts/paradigms_of_parallelism/#pipeline-parallel, https://arxiv.org/abs/1811.06965

Last updated on Jul 8, 2025