MIT’s AI model can predict protein location in human cells without lab experiments

For the first time now, a new computational technique can predict exactly where any human protein will sit inside any type of cultured cell, without needing a single lab-based localization experiment.

0
198
Protein-Label by MIT

For the first time now, a new computational technique can predict exactly where any human protein will sit inside any type of cultured cell, without needing a single lab-based localization experiment. Traditional methods for mapping protein locations are slow and costly, profiling only a few proteins at a time across limited cell lines.

Now, by marrying advances in protein language modeling with image inpainting, scientists at MIT, Harvard and the Broad Institute have launched PUPS (Prediction of Unseen Protein Subcellular location), a system that forecasts a protein’s position within an individual cell even when neither the protein nor the cell line has been previously observed.

To build PUPS, the team first ran a protein sequence model on the amino-acid chain and predicted three-dimensional structure of each protein, capturing the molecular signals that guide its cellular address. Simultaneously, they applied an inpainting-based vision model to three fluorescently stained images of a cell – one each for nucleus, microtubules and endoplasmic reticulum – to learn the cell’s type, state and subtle single-cell variations. By fusing these two data streams, PUPS outputs an annotated cell image with a highlighted region indicating where the protein is predicted to localize.

“Different cells within a cell line exhibit different characteristics, and our model is able to understand that nuance,” says Yitong Tseo, a graduate student in MIT’s Computational and Systems Biology program and co-lead author. “You could run these protein-localization experiments on a computer without touching a pipette, potentially saving months of benchwork.”

Protein mis-placement underlies diseases from Alzheimer’s to cystic fibrosis and cancer, yet major resources like the Human Protein Atlas have mapped only about 0.25 percent of all possible protein–cell combinations. By generalizing to unseen proteins and cell lines, PUPS could quicken both fundamental research and drug discovery, flagging aberrant localization patterns driven by disease mutations or therapeutic interventions.

During training, the researchers added a secondary challenge: alongside reconstructing missing image regions, PUPS had to name the specific subcellular compartment—nucleus, Golgi or mitochondria, for example. This multitask setup boosted the model’s grasp of intracellular geography, much like asking students to label as well as sketch a diagram. In validation studies, PUPS outperformed existing AI methods, showing lower error when its predictions were matched against experimental observations in lab-grown cells.

“This is unique in that it can generalize across proteins and cell lines at the same time,” notes Xinyi Zhang, co-lead author and graduate student at MIT’s Schmidt Center. “Most other approaches require prior lab images of the protein to learn from.”

The study has been published in the journal BioXriv.