Multi-GPU Strategies
This section will outline the parallelism strategies for multi-GPUs.
Data Parallelism : data is split across GPUs.
Model Parallelism : model is split across GPUs. There are two types: Pipeline Parallelism and Tensor Parallelism.
| Parallelism Type | What is Split? | Use Cases |
|---|---|---|
| Data Parallelism | Input data | Long prompts/inputs |
| Pipeline Parallelism | Model layers | LLM exceeds single GPU memory: LLM is deep but not too wide |
| Tensor Parallelism | Inside model layers (tensors) | LLM exceeds single GPU memory: LLM layers are too large for one device |