Protein programmers get help from Cradle’s generative AI • TechCrunch

Proteins are the molecules that get the job done in nature, and an entire industry is springing up around successfully modifying and manufacturing them for various uses. But doing it is time consuming and messy; Cradle aims to change that with an AI-powered tool that tells scientists what new structures and sequences will make a protein do what they want. The company came out of stealth today with a substantial seed round.

AI and protein have been in the news lately, but largely due to the efforts of research teams like deep mind Y baker lab. Its machine-learning models take easily collected RNA sequence data and predict the structure a protein will take, a step that used to take weeks and expensive special equipment.

But as incredible as that capability is in some domains, it’s just the starting point for others. Modifying a protein so that it is more stable or binds to another particular molecule involves much more than just understanding its general shape and size.

“If you’re a protein engineer and you want to engineer a certain property or function into a protein, just knowing what it looks like doesn’t help you. It’s like if you have a picture of a bridge, that doesn’t tell you if it will fall down or not,” explained Cradle CEO and co-founder Stef van Grieken.

“Alphafold takes a sequence and predicts what the protein will look like,” he continued. “We’re the generative brother of that: you choose the properties you want to engineer, and the model will generate sequences that you can test in your lab.”

Predict what proteins will do, especially those new to science. in the place It is a difficult task for many reasons, but in the context of machine learning, the biggest problem is that there is not enough data available. So Cradle sourced much of his own dataset in a wet lab, testing protein after protein and seeing what changes in his own sequences seemed to lead to what effects.

Interestingly, the model itself is not exactly specific to biotechnology, but rather a derivative of it. “big language models” who have produced text output engines like GPT-3. Van Grieken noted that these models are not strictly limited to language in how they understand and predict data, an interesting “generalization” feature that researchers are still exploring.

Examples of the Cradle user interface in action. Image Credits: Cradle

The protein sequences that Cradle ingests and predicts are not in any language that we know of, of course, but are relatively simple linear sequences of text that have associated meanings. “It’s like an alien programming language,” van Grieken said.

Protein engineers aren’t helpless, of course, but their work necessarily involves a lot of guesswork. One can be pretty sure that among the 100 sequences they are tweaking is the combination that will produce the desired effect, but beyond that, it all comes down to extensive testing. A little hint here could speed things up considerably and prevent a lot of fruitless work.

The model works in three basic layers, he explained. It first tests whether a given sequence is “natural”, ie. either a significant amino acid sequence or just random. This is similar to a language model capable of saying with 99% confidence that a sentence is in English (or Swedish, in van Grieken’s example) and that the words are in the correct order. It knows this by “reading” millions of such sequences determined by laboratory analysis.

Next, it discusses the actual or potential meaning in the foreign language of the protein. “Imagine we give you a sequence, and this is the temperature at which this sequence will fall apart,” she said. “If you do that for many sequences, you can say not just, ‘this looks natural,’ but ‘this looks like 26 degrees Celsius.’ that helps the model determine which regions of the protein to focus on.”

The model can then suggest sequences to fit: informed guesses, essentially, but a starting point stronger than zero. The engineer or lab can test it and push that data to the Cradle platform, where it can be ingested again and used to adjust the model to the situation.

The Cradle team on a good day at their headquarters (van Grieken in the middle). Image Credits: Cradle

Protein modification for various purposes is useful in biotechnology, from drug design to biomanufacturing, and the path from the vanilla molecule to the effective and efficient personalized molecule can be long and expensive. Any way to shorten it will probably be welcomed, at least, by lab technicians who have to run hundreds of experiments just to get a good result.

Cradle has been operating stealthily and is now emerging after raising $5.5 million in a seed round co-led by Index Ventures and Kindred Capital, with participation from angels John Zimmer, Feike Sijbesma and Emily Leproust.

Van Grieken said the funding would allow the team to expand data collection (the more the merrier when it comes to machine learning) and work on the product to be “more self-service.”

“Our goal is to reduce the cost and time of bringing a biobased product to market by an order of magnitude,” van Grieken said in the press release, “so that anyone, even ‘two kids in their garage’ , can bring to market a bio-based product”.

Leave a Reply

Your email address will not be published. Required fields are marked *