2026-06-11 google
Google open-sourced the first mainstream text diffusion model. The real story isn't 'fast'. It's that the local decode bottleneck moves from memory bandwidth to compute, with bidirectional attention generating 256 tokens at once. The cost: quality, experimental status, and the 26B MoE trade-offs.
Read analysis 2026-06-11 google
Gemma 4 12B feeds vision and audio straight into the language backbone, dropping dedicated encoders. That's an architecture bet, not just another on-device model.
Read analysis 2026-06-11 google
Google shipped quantization-aware training weights for Gemma 4, squeezing E2B down to 1GB so it runs on phones and consumer GPUs. The turn that matters isn't 'it fits now'. It's that the hard problem moved to power draw, the privacy boundary, and exactly how much quality you lose.
Read analysis