Gemma 3 Memory Requirements

Below are the GPU Memory Requirements for Google’s Gemma 3 models across four context windows.

Model 4K Tokens 8K Tokens 32K Tokens 128K Tokens
4B @ FP16 9.6 + 1.4 = 11 GB 9.6 + 2.8 = 12.4 GB 9.6 + 11.2 = 20.8 GB 9.6 + 44.8 = 54.4 GB
4B @ INT4 2.4 + 0.4 = 2.8 GB 2.4 + 0.8 = 3.2 GB 2.4 + 2.9 = 5.3 GB 2.4 + 11.6 = 14 GB
12B @ FP16 28.8 + 3.1 = 31.9 GB 28.8 + 6.2 = 35 GB 28.8 + 24.8 = 53.6 GB 28.8 + 99.2 = 128 GB
12B @ INT4 7.2 + 0.8 = 8 GB 7.2 + 1.6 = 8.8 GB 7.2 + 6.4 = 13.6 GB 7.2 + 25.6 = 32.8 GB
27B @ FP16 64.8 + 5.5 = 70.3 GB 64.8 + 11 = 75.8 GB 64.8 + 44 = 108.8 GB 64.8 + 176 = 241 GB
27B @ INT4 16.2 + 1.4 = 17.6 GB 16.2 + 2.8 = 19 GB 16.2 + 11.2 = 27.4 GB 16.2 + 44.8 = 61 GB

Models are given in parameter counts (billions) at specific precisions (i.e., floating point 16 or integer 4 quantization) and variable context windows:

  • 4B: 34 Layers, 2560 Embedding Dim
  • 12B: 48 Layers, 3840 Embedding Dim
  • 27B: 62 Layers, 5376 Embedding Dim

Each calculation is a breakdown of M (Model Size) + N (QKV Cache Size) from Calculating GPU Memory Requirements.

Previous
Next
RC Logo RC Logo © 2025 The Rector and Visitors of the University of Virginia