Gemma 3 Memory Requirements

Below are the GPU Memory Requirements for Google’s Gemma 3 models across four context windows.

Model	4K Tokens	8K Tokens	32K Tokens	128K Tokens
4B @ FP16	9.6 + 1.4 = 11 GB	9.6 + 2.8 = 12.4 GB	9.6 + 11.2 = 20.8 GB	9.6 + 44.8 = 54.4 GB
4B @ INT4	2.4 + 0.4 = 2.8 GB	2.4 + 0.8 = 3.2 GB	2.4 + 2.9 = 5.3 GB	2.4 + 11.6 = 14 GB
12B @ FP16	28.8 + 3.1 = 31.9 GB	28.8 + 6.2 = 35 GB	28.8 + 24.8 = 53.6 GB	28.8 + 99.2 = 128 GB
12B @ INT4	7.2 + 0.8 = 8 GB	7.2 + 1.6 = 8.8 GB	7.2 + 6.4 = 13.6 GB	7.2 + 25.6 = 32.8 GB
27B @ FP16	64.8 + 5.5 = 70.3 GB	64.8 + 11 = 75.8 GB	64.8 + 44 = 108.8 GB	64.8 + 176 = 241 GB
27B @ INT4	16.2 + 1.4 = 17.6 GB	16.2 + 2.8 = 19 GB	16.2 + 11.2 = 27.4 GB	16.2 + 44.8 = 61 GB

Models are given in parameter counts (billions) at specific precisions (i.e., floating point 16 or integer 4 quantization) and variable context windows:

4B: 34 Layers, 2560 Embedding Dim
12B: 48 Layers, 3840 Embedding Dim
27B: 62 Layers, 5376 Embedding Dim

Each calculation is a breakdown of M (Model Size) + N (QKV Cache Size) from Calculating GPU Memory Requirements.

Last updated on Jul 8, 2025