Pipeline Parallelism (Inter-Layer)
In inter-layer pipeline parallelism, each GPU contains a different model stage (1+ layers).
GPUs compute on batches of data in parallel but must wait for previous stage to complete.
The advantages of inter-layer pipeline parallelism are that it reduces per-GPU memory use and improves inference throughput. The disadvantage of inter-layer pipeline parallelism is that it adds inference latency.
Source: https://colossalai.org/docs/concepts/paradigms_of_parallelism/#pipeline-parallel, https://arxiv.org/abs/1811.06965