AI predicts disease rally with deep dive into evolutionary genetics
Researchers used unsupervised machine learning to predict disease-causing properties in more than 36 million genetic variants across more than 3,200 disease-related genes.
During this process, they advanced the classification of more than 256,000 genetic variants whose properties – useful, harmful, or neither – were unknown.
The work was carried out at Harvard Medical School and the University of Oxford. The resulting study is published online in Nature.
“Quantifying the pathogenicity of protein variants in genes linked to human disease would have a marked effect on clinical decisions, but the overwhelming majority (over 98%) of these variants still have unknown consequences,” write the lead co-authors Jonathan Frazer, Mafalda Dias and colleagues to contextualize their quest.
“In principle, computer methods could support large-scale interpretation of genetic variants,” they add. “However, the cutting edge methods have relied on training machine learning models on labels of known diseases.”
For the current project, the team sought to overcome this limitation by modeling the distribution of sequence variation across organisms and over large expanses of time.
In doing so, they hypothesized that they would isolate fitness-maintaining characteristics in protein sequences.
Calling their EVE model for an evolutionary variant-effect model, the authors report that their technique has been shown to be more accurate than data-tagged AI approaches.
In addition, it can match or improve the predictions of the most commonly used approaches.
The team says their work with EVE suggests that evolutionary information models may “provide valuable independent evidence for the interpretation of variants that will be widely useful in research and clinical settings.”