open-models

2026-06-11 google

DiffusionGemma: Text Diffusion Finally Reaches Mainstream Open Source

Google open-sourced the first mainstream text diffusion model. The real story isn't 'fast'. It's that the local decode bottleneck moves from memory bandwidth to compute, with bidirectional attention generating 256 tokens at once. The cost: quality, experimental status, and the 26B MoE trade-offs.

open-models inference local-ai

Read analysis

2026-06-11 google

Gemma 4 12B Drops the Multimodal Encoder: Google's Bet on a Unified Token Space

Gemma 4 12B feeds vision and audio straight into the language backbone, dropping dedicated encoders. That's an architecture bet, not just another on-device model.

open-models multimodal local-ai

Read analysis

2026-06-11 google

Gemma 4's QAT weights: on-device inference just swapped its real bottleneck

Google shipped quantization-aware training weights for Gemma 4, squeezing E2B down to 1GB so it runs on phones and consumer GPUs. The turn that matters isn't 'it fits now'. It's that the hard problem moved to power draw, the privacy boundary, and exactly how much quality you lose.

open-models quantization local-ai

Read analysis