GPU Memory Requirements
Below are the GPU memory requirements for Google’s Gemma 3 models for four different context windows.
| Model | 4K Tokens | 8K Tokens | 32K Tokens | 128K Tokens |
|---|---|---|---|---|
| 4B @ FP16 | 11 GB | 12.4 GB | 20.8 GB | 54.4 GB |
| 12B @ FP16 | 31.9 GB | 35 GB | 53.6 GB | 128 GB |
| 27B @ FP16 | 70.3 GB | 75.8 GB | 108.8 GB | 241 GB |
| 4B @ INT4 | 2.8 GB | 3.2 GB | 5.3 GB | 14 GB |
| 12B @ INT4 | 8 GB | 8.8 GB | 13.6 GB | 32.8 GB |
| 27B @ INT4 | 17.6 GB | 19 GB | 27.4 GB | 61 GB |
Models are given in parameter counts (billions) at specific precisions (i.e., floating point 16 or integer 4 quantization) and variable context windows.