Multi-GPU Strategies

This section will outline the parallelism strategies for multi-GPUs.

Data Parallelism : data is split across GPUs.

Model Parallelism : model is split across GPUs. There are two types: Pipeline Parallelism and Tensor Parallelism.

Parallelism Type What is Split? Use Cases
Data Parallelism Input data Long prompts/inputs
Pipeline Parallelism Model layers LLM exceeds single GPU memory: LLM is deep but not too wide
Tensor Parallelism Inside model layers (tensors) LLM exceeds single GPU memory: LLM layers are too large for one device
Previous
Next
RC Logo RC Logo © 2025 The Rector and Visitors of the University of Virginia