WHY THIS MATTERS IN BRIEF
With the advent of personalised medicine and new revolutionary gene treatments, being able to interpret and make sense of the vast volumes of human genomic data in order to better predict disease indicators and outcomes, and provide patients with better treatments, will become increasingly crucial.
15 years since scientists first successfully sequenced the human genome thousands of research teams around the world are still trying to make sense of the huge data trove, and while the scale of the challenge is a formidable one for humans it’s a comparative walk in the park for some of today’s more advanced Artificial Intelligence (AI) platforms.
Last week Google announced the release of “DeepVariant,” a new AI tool that uses the latest AI techniques to compile a more accurate picture of a person’s entire genome from the masses of sequenced data. The result is a platform that turns high throughput sequencing readouts into a picture of a person’s full genome, and that can even automatically identify small insertion and deletion mutations and single base pair mutations in the data. Something that in an age, where we are now at the very start of re-writing a living person’s DNA in vivo, such as the recent experiment to cure Brian Madeaux’s inherited disease, Hunters Syndrome, will become an increasingly important, if not vital capability.
High throughput genome sequencing first became widely available in the early 2000’s and it’s since helped to democratise the genome sequencing process, but in the past the data produced using such systems offered only a limited, error prone snapshot of a person’s full genome, and even now it’s still hard for scientists to identify the small mutations and random errors generated during the sequencing process that might have a direct impact on a person’s propensity to develop a variety of diseases, including Cancer.
While a number of tools already exist for interpreting readouts, including GATK, VarDict, and FreeBayes these software programs typically use simpler statistical and machine learning approaches to identify mutations by attempting to rule out read errors.
“One of the challenges is in difficult parts of the genome, where each of the [tools] has strengths and weaknesses,” says Brad Chapman, a research scientist at Harvard University who tested an early version of DeepVariant, “these difficult regions are increasingly important for clinical sequencing, and it’s important to have multiple methods.”
DeepVariant was developed by researchers from the Google Brain team, a group that focuses on developing and applying AI techniques, and Verily, a multi-billion dollar Alphabet subsidiary that focuses on life sciences.
The team collected millions of high-throughput reads and fully sequenced genomes from the Genome in a Bottle (GIAB) project, a public private effort to promote genomic sequencing tools and techniques and then fed the data into their deep learning system, painstakingly tweaking their models parameters until it learned to accurately interpret the sequenced. Then, last year, DeepVariant won first place in the PrecisionFDA Truth Challenge, a contest run by the FDA to promote more accurate genetic sequencing.
“The success of DeepVariant is important because it demonstrates that in genomics, deep learning can be used to automatically train systems that perform better than complicated hand engineered systems,” says Brendan Frey, CEO of Deep Genomics.
The release of DeepVariant is also the latest sign that AI may be poised to boost progress in genomics, and Deep Genomics is one of several companies trying to use new AI tools and techniques, such as deep learning, to tease out genetic causes of diseases and to identify potential drug therapies.
Frey then went on to say that he thinks AI will eventually go well beyond helping to sequence genomic data.
“The gap that is currently blocking medicine right now is in our inability to accurately map genetic variants to disease mechanisms and to use that knowledge that help us rapidly identify life saving therapies and treatments,” he says.
DeepVariant will be available on the Google Cloud Platform next year.