2026-06-11

Biohub's Protein World Model: How It Differs From AlphaFold-Style Structure Prediction

Biohub open-sourced a protein world model. The claim that matters is not another structure prediction, it is designing binders that actually function in the lab. The credibility holds in the binder corner.

biology world-models science

Biohub's Protein World Model: How It Differs From AlphaFold-Style Structure Prediction — Photo / Unsplash

Summary

On May 27, Biohub (full name Chan Zuckerberg Biohub, a 501(c)(3) nonprofit funded by Mark Zuckerberg and Priscilla Chan) open-sourced what it calls a world model of protein biology: three components, ESMC, ESMFold2, and ESM Atlas. The part worth unpacking is not “another structure predictor.” It is that the release pushes the bio-AI proposition one step forward. The AlphaFold generation solved “given a sequence, predict the shape it folds into.” Biohub’s core claim this time is “I design a brand-new sequence and it actually binds its target as intended in a test tube.”

That difference is real. The hard numbers in the announcement land in a narrow but critical area: designing protein binders against five cancer and immunology targets, with hit rates of 36-88% for compact minibinders and 15-29% for antibody-derived formats, all with confirmed binding in the lab. A preclinical binder search that normally takes three to four years was compressed into a computation measured in days. The value is whether it can produce functional designs, not whether it tops one more benchmark. But the phrase “world model” is doing heavy lifting; what the model learned is closer to the physical rules at the protein layer, and its credibility is locked, for now, to the binder-design corner.

What happened

Biohub shipped three pieces at once, positioned as an open discovery engine:

ESMC is the foundation, a language model that represents proteins, trained on roughly 2.8 billion sequences drawn from across the tree of life. Its core scientific hypothesis: a language model trained to predict the amino acids evolution selects will internalize the rules that govern how proteins fold, interact, and function, because evolution preserves proteins that are fit for purpose, so the patterns kept across billions of years implicitly encode those physical rules.

ESMFold2 is the design engine, turning ESMC’s sequence representations into atomic-resolution 3D structures, including complexes. The announcement says it leads standard folding benchmarks, especially on protein-protein and antibody-antigen interactions: from ESMC representations alone, it predicts the true antibody-antigen binding pose more successfully than AlphaFold 3, and with the same evolutionary information (MSA) it is the strongest predictor on both benchmarks. It also benefits from more compute, improving as it makes multiple predictions and scores them by its own confidence.

The design experiments carry the argument. In a preprint, Biohub used ESMFold2 to design binders against five targets: EGFR and PDGFRβ (implicated in tumor growth), PD-L1 and CTLA-4 (immune checkpoints cancer exploits to evade detection), and CD45 (a regulator of immune signaling). Beyond the hit rates above, the key result is that designed PD-L1 binders restored T cell signaling in the lab, blocking the same pathway approved checkpoint therapies target. The announcement also stresses these designs show minimal similarity to known sequences in public databases, suggesting de novo generation rather than retrieval of known binders.

ESM Atlas is the third piece, making ESMC’s representations navigable across 6.8 billion protein sequences and 1.1 billion predicted structures, organized by relationships the model learned, surfacing connections existing databases missed, such as evolutionary links between gene-editing enzymes on distant branches of life. All three are free on Biohub Platform, and an HN commenter confirmed the model carries an MIT License.

Why it matters

Put it in context. For the past few years the dominant theme in bio-AI has been predicting static structure. AlphaFold2/3 took “sequence to structure” to a remarkable level, but it is fundamentally describing what an already-existing protein looks like. Going from “predict an existing thing” to “design a thing that does not exist yet and must function in a real biological system” crosses a real gulf, because the latter demands the model grasp not just shape but functional rules, what kind of interface produces effective binding. Biohub’s claim is that it crossed that gulf and brought wet-lab data, not just computational metrics.

What makes this judgment rather than hype is that it sidesteps the “another benchmark” trap. The hardest lesson in structure prediction is that a high benchmark score does not mean it works in real biology. A pharma practitioner on HN put it plainly: AlphaFold2 was great, but its training data all came from single-state X-ray crystallography, which is not how proteins behave in the wild, so predicting what binds to what is still a largely unsolved problem rooted in a lack of data. Biohub reports not another structure RMSD score but “the designed thing worked as intended in a test tube.” Alex Rives anchors his words there too: the models learned a world model of biology high-fidelity enough that you can design protein interfaces computationally, take them into the lab, and they function as predicted. If these wet-lab results survive independent replication, that is a substantive move from describing biology toward programming it.

But dial down the word “world model.” It did not learn the dynamics at the level of cells, tissues, or whole organisms. It learned the physical rules at the protein layer, implied by evolutionary sequences. It can design a protein that folds and binds; it cannot predict the system-level consequences of that protein once it enters a living body. HN commenters kept noting that biology is squishier, weirder, and less predictable than silicon, and that holds for the model’s scope. It crossed one step on the relatively clean subproblem of binders, which is not the same as having a world model of all of biology.

Builder impact

If you work in drug discovery or protein engineering, this is worth evaluating now, and the barrier is low: three components, open source, MIT license, free platform. The most direct use is to swap early binder screening from empirical wet-lab triage to computation-guided design, then take a small set of candidates into the lab. Antibody-based therapies account for roughly a quarter of new FDA drug approvals, and a single preclinical binder candidate typically takes three to four years, so any tool that compresses the front-end search from months or years to days carries real leverage.

When evaluating, draw the credibility boundary clearly. ESMFold2 is good at the peptide and binder scale, not large macromolecular complexes, and HN practitioners are explicit about this. Atom-level accuracy is still hit-and-miss; one commenter noted a predicted or designed active site can differ from the real structure by a side chain or two, enough to change how the interaction reads. So the right workflow is “model produces many computational candidates, wet lab does the hard filtering,” not treating model output as a trustworthy structure to push downstream. The 36-88% hit-rate range itself says this: high on good targets, far from reliably high, so discount by the difficulty of your specific target.

One more thing to verify is the license and the institution. MIT is good news, genuinely usable and modifiable. But Biohub is a nonprofit, and an HN commenter drew the OpenAI comparison directly, a reminder that “nonprofit” has gotten slippery in recent years. Teams embedding this in a commercial pipeline should check per-component license terms on the platform rather than assume no restrictions from one line about free access.

What to ignore

Ignore the literal ambition in “world model.” It is not a world model at the cell or organism level and cannot simulate the system-level dynamics of a protein once it enters a living body. The announcement’s own wording is actually restrained, pointing to physical rules at the protein layer; what gets amplified is the connotation of the label. Reading it as “AI can now simulate all of biology” is overreach. The step it took is specific and bounded: on the subproblem of protein binder design, it moved from prediction to design that produces function.

Ignore the “AlphaFold killer” framing. The accompanying Nature piece is literally titled “Move over, AlphaFold,” but “beats AlphaFold 3 on antibody-antigen pose on a chosen benchmark” is a specific, narrow claim, not wholesale replacement, and it is Biohub’s own with no independent replication yet. Tools in structure prediction tend to be complementary rather than substitutionary; the AlphaFold lineage’s strengths, training setup, and availability all remain. Reading a single-benchmark lead as “AlphaFold is obsolete” is both inaccurate and misleading for selection.

Ignore the inverse noise of “few HN comments, so it does not matter.” The post sits at 155 points with a thin comment thread, but attention is never evidence of importance, and the post landed late on a Saturday night US time, a low-traffic window HN regulars noted themselves. Whether this deserves serious attention depends on whether the wet-lab data replicates independently, not on how many software engineers commented on it.

FAQ

Can ESMFold2 be used commercially under its license?

Biohub says all three releases (ESMC, ESMFold2, ESM Atlas) are freely available to researchers on Biohub Platform. On HN, a commenter confirmed the model carries an MIT License, which is genuinely permissive for commercial and derivative use, cleaner than the restricted academic licenses common in this subfield. The announcement does not enumerate per-component license terms, so verify each on the platform before building on it commercially.

Is ESMFold2 really better than AlphaFold 3 at antibody-antigen prediction?

Per the announcement: from ESMC representations alone, ESMFold2 predicts the true binding pose of antibody-antigen complexes more successfully than AlphaFold 3, and given the same evolutionary information (MSA) it is the strongest predictor on both benchmarks. Note this is Biohub's own result on its chosen benchmarks. Independent replication is not out yet, so treat the 'strongest' claim as pending third-party evaluation.

What can ESMFold2 not do?

Pharma practitioners on HN flagged two limits. Large, complex macromolecular assemblies are still poorly predicted; the models are good at the peptide and binder scale. And atom-level accuracy remains hit-and-miss, where a designed active site can differ from the X-ray or cryo-EM structure by a side chain or two, enough to change how an interaction reads. It moves the early search from months or years to days, but it does not replace wet-lab validation.