2026-06-11 google
Gemma 4's QAT weights: on-device inference just swapped its real bottleneck
Google shipped quantization-aware training weights for Gemma 4, squeezing E2B down to 1GB so it runs on phones and consumer GPUs. The turn that matters isn't 'it fits now'. It's that the hard problem moved to power draw, the privacy boundary, and exactly how much quality you lose.
Read analysis