Pipeline Parallelism (Inter-Layer)

In inter-layer pipeline parallelism, each GPU contains a different model stage (1+ layers).

GPUs compute on batches of data in parallel but must wait for previous stage to complete.

The advantages of inter-layer pipeline parallelism are that it reduces per-GPU memory use and improves inference throughput. The disadvantage of inter-layer pipeline parallelism is that it adds inference latency.

Source: https://colossalai.org/docs/concepts/paradigms_of_parallelism/#pipeline-parallel, https://arxiv.org/abs/1811.06965

Previous
Next
RC Logo RC Logo © 2025 The Rector and Visitors of the University of Virginia