GPU Memory Requirements

Below are the GPU memory requirements for Google’s Gemma 3 models for four different context windows.

Model 4K Tokens 8K Tokens 32K Tokens 128K Tokens
4B @ FP16 11 GB 12.4 GB 20.8 GB 54.4 GB
12B @ FP16 31.9 GB 35 GB 53.6 GB 128 GB
27B @ FP16 70.3 GB 75.8 GB 108.8 GB 241 GB
4B @ INT4 2.8 GB 3.2 GB 5.3 GB 14 GB
12B @ INT4 8 GB 8.8 GB 13.6 GB 32.8 GB
27B @ INT4 17.6 GB 19 GB 27.4 GB 61 GB

Models are given in parameter counts (billions) at specific precisions (i.e., floating point 16 or integer 4 quantization) and variable context windows.

Previous
Next
RC Logo RC Logo © 2025 The Rector and Visitors of the University of Virginia