WHY THIS MATTERS IN BRIEF
AI is getting better at creating and innovating all manner of things – and now it’s proteins and drug treatments.
The explosion in Text-to-Image Artificial Intelligence (AI) models like DALL-E 2 from OpenAI – programs that are trained to generate images and all manner of other things that you ask for on demand – has sent ripples through the creative industries, from fashion to filmmaking, by providing weird and wonderful images and content on demand.
However, the same AI technology behind these programs is now also unexpectedly making a splash in biotech labs which have started using this type of Generative AI, known as a diffusion model, to conjure up designs for entirely new types of proteins never seen in nature. And for anyone that knows their proteins they’ll know just how revolutionary that could be for almost all industries, from biotech and energy to computing and manufacturing and beyond.
Today, a months after DeepMind, Google’s deep AI lab, managed to simulate every known protein on Earth, two labs in the US separately announced programs that use diffusion models to generate designs for novel proteins with more precision than ever before. Generate Biomedicines, a Boston based startup, revealed a program called Chroma, which the company describes appropriately as the “DALL-E 2 of Biology.”
The of Work and longevity, by keynote Matthew Griffin
At the same time, a team at the University of Washington led by biologist David Baker has built a similar program called RoseTTAFold Diffusion. In a preprint paper posted online today, Baker and his colleagues show that their model can generate precise designs for novel proteins that can then be brought to life in the lab.
“We’re generating proteins with really no similarity to existing ones,” says Brian Trippe, one of the co-developers of RoseTTAFold.
These protein generators can be directed to produce designs for proteins with specific properties, such as shape or size or function. In effect, this makes it possible to come up with new proteins to do particular jobs on demand. Researchers hope that this will eventually lead to the development of new and more effective drugs.
“We can discover in minutes what took evolution millions of years,” says Gevorg Grigoryan, CTO of Generate Biomedicines.
“What is notable about this work is the generation of proteins according to desired constraints,” says Ava Amini, a biophysicist at Microsoft Research in Cambridge, Massachusetts.
Proteins are the fundamental building blocks of living systems. In animals, they digest food, contract muscles, detect light, drive the immune system, and so much more. When people get sick, proteins play a part. Proteins are thus prime targets for drugs. And many of today’s newest drugs are protein based themselves.
“Nature uses proteins for essentially everything,” says Grigoryan. “The promise that offers for therapeutic interventions is really immense.”
But drug designers currently have to draw on an ingredient list made up of natural proteins. The goal of protein generation is to extend that list with a nearly infinite pool of computer-designed ones
Computational techniques for designing proteins are not new. But DeepMind’s breakthrough asides previous approaches have been slow and not great at designing large proteins or protein complexes – molecular machines made up of multiple proteins coupled together. And such proteins are often crucial for treating diseases.
The two programs announced today are also not the first use of diffusion models for protein generation. A handful of studies in the last few months from Amini and others have shown that diffusion models are a promising technique, but these were proof-of-concept prototypes. Chroma and RoseTTAFold Diffusion build on this work and are the first full-fledged programs that can produce precise designs for a wide variety of proteins.
Namrata Anand, who co-developed one of the first diffusion models for protein generation in May 2022, thinks the big significance of Chroma and RoseTTAFold Diffusion is that they have taken the technique and supersized it, training on more data and more computers. “It may be fair to say that this is more like DALL-E because of how they’ve scaled things up,” she says.
Diffusion models are neural networks trained to remove “noise” – random perturbations added to data – from their input. Given a random mess of pixels, a diffusion model will try to turn it into a recognizable image.
In Chroma, noise is added by unravelling the amino acid chains that a protein is made from. Given a random clump of these chains, Chroma tries to put them together to form a protein. Guided by specified constraints on what the result should look like, Chroma can generate novel proteins with specific properties.
Baker’s team takes a different approach, though the end results are similar. Its diffusion model starts with an even more scrambled structure. Another key difference is that RoseTTAFold Diffusion uses information about how the pieces of a protein fit together provided by a separate neural network trained to predict protein structure as DeepMind’s AlphaFold does. This guides the overall generative process.
Generate Biomedicines and Baker’s team both show off an impressive array of results. They are able to generate proteins with multiple degrees of symmetry, including proteins that are circular, triangular, or hexagonal. To illustrate the versatility of their program, Generate Biomedicines generated proteins shaped like the 26 letters of the Latin alphabet and the numerals 0 to 10. Both teams can also generate pieces of proteins, matching new parts to existing structures.
Most of these demonstrated structures would serve no purpose in practice. But because a protein’s function is determined by its shape, being able to generate different structures on demand is crucial, and once this genie is out of the bottle it could be possible to make almost any protein to cure any disease or for use in any application – from washing detergents to advanced bio-manufacturing applications and far beyond.
Generating strange designs on a computer is one thing. But the goal is to turn these designs into real proteins. To test whether Chroma produced designs that could be made, Generate Biomedicines took the sequences for some of its designs – the amino acid strings that make up the protein – and ran them through another AI program. They found that 55% of them would be predicted to fold into the structure generated by Chroma, which suggests that these are designs for viable proteins.
Baker’s team ran a similar test. But Baker and his colleagues have gone a lot further than Generate Biomedicines in evaluating their model. They have created some of RoseTTAFold Diffusion’s designs in their lab – they also say they’re doing lab tests but is not yet ready to share results.
“This is more than just proof of concept,” says Trippe. “We’re actually using this to make really great proteins.”
For Baker, the headline result is the generation of a new protein that attaches to the parathyroid hormone, which controls calcium levels in the blood.
“We basically gave the model the hormone and nothing else and told it to make a protein that binds to it,” he says. When they tested the novel protein in the lab, they found that it attached to the hormone more tightly than anything that could have been generated using other computational methods – and more tightly than existing drugs.
“It came up with this protein design out of thin air,” says Baker.
Grigoryan acknowledges that inventing new proteins is just the first step of many.
“We’re a drug company,” he says. “At the end of the day what matters is whether we can make medicines that work or not.”
Protein based drugs need to be manufactured in large numbers, then tested in the lab and finally in humans. This can take years. But he thinks that his company and others will find ways to speed up those steps up as well, perhaps using digital twins of human patients.
“The rate of scientific progress comes in fits and starts,” says Baker. “But right now we’re in the middle of what can only be called a technological revolution.”