Gemma 3 Memory Requirements
Below are the GPU Memory Requirements for Google’s Gemma 3 models across four context windows.
| Model | 4K Tokens | 8K Tokens | 32K Tokens | 128K Tokens |
|---|---|---|---|---|
| 4B @ FP16 | 9.6 + 1.4 = 11 GB | 9.6 + 2.8 = 12.4 GB | 9.6 + 11.2 = 20.8 GB | 9.6 + 44.8 = 54.4 GB |
| 4B @ INT4 | 2.4 + 0.4 = 2.8 GB | 2.4 + 0.8 = 3.2 GB | 2.4 + 2.9 = 5.3 GB | 2.4 + 11.6 = 14 GB |
| 12B @ FP16 | 28.8 + 3.1 = 31.9 GB | 28.8 + 6.2 = 35 GB | 28.8 + 24.8 = 53.6 GB | 28.8 + 99.2 = 128 GB |
| 12B @ INT4 | 7.2 + 0.8 = 8 GB | 7.2 + 1.6 = 8.8 GB | 7.2 + 6.4 = 13.6 GB | 7.2 + 25.6 = 32.8 GB |
| 27B @ FP16 | 64.8 + 5.5 = 70.3 GB | 64.8 + 11 = 75.8 GB | 64.8 + 44 = 108.8 GB | 64.8 + 176 = 241 GB |
| 27B @ INT4 | 16.2 + 1.4 = 17.6 GB | 16.2 + 2.8 = 19 GB | 16.2 + 11.2 = 27.4 GB | 16.2 + 44.8 = 61 GB |
Models are given in parameter counts (billions) at specific precisions (i.e., floating point 16 or integer 4 quantization) and variable context windows:
- 4B: 34 Layers, 2560 Embedding Dim
- 12B: 48 Layers, 3840 Embedding Dim
- 27B: 62 Layers, 5376 Embedding Dim
Each calculation is a breakdown of M (Model Size) + N (QKV Cache Size) from Calculating GPU Memory Requirements.