AlphaFold 2, open source AI for protein structure prediction – Technology
To print this article, simply register or connect to Mondaq.com.
On July 15, a team of scientists released a
Nature article titled “Highly Accurate Protein Structure Prediction with AlphaFold”.1 The article describes how the neural network model developed by Google’s DeepMind can predict protein structures “with atomic precision even when no similar structure is known.”2 Additionally, DeepMind has now opened the code for AlphaFold 2, enabling new collaborations for even more accurate prediction of protein structure.
A protein can have a very complex 3D structure through a process called protein folding, and the task of predicting structure is “a major research problem open for over 50 years.”3 Last year, DeepMind entered the CASP14 (14th Critical Appraisal of Protein Structure Prediction) research competition, won the competition, and redesigned AlphaFold to create AlphaFold 2 in December 2020. The CASP competitions, considered ” the protein folding Olympics “,4 have taken place every two years since 1994, and after the development of AlphaFold 2, some believe that the protein folding problem has essentially been solved. DeepMind has successfully improved the accuracy of prediction “by incorporating new neural network architectures and training procedures based on the evolutionary, physical and geometric constraints of protein structure”.5
AlphaFold inspired further research efforts, which led to the publication of another article on July 15, “Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network”.6 The article by university researchers describes how their RoseTTAFold model predicted protein structures to a level of accuracy close to that of AlphaFold. The model comprises a three-track network where “the information at the level of the 1D sequence, at the level of the 2D distance map and at the level of the 3D coordinates are successively transformed and integrated”. With such technology, “RoseTTAFold solves difficult X-ray crystallography and cryo-EM modeling problems, provides insight into protein function in the absence of experimentally determined structures, and rapidly generates accurate models of protein complexes. -protein.”
The misfolding of proteins could lead to various diseases and disorders, and therefore the availability of computer tools that provide insight into protein folding is important for drug discovery and development. Prediction models, along with experimental techniques, should help to better understand the causes of diseases and to design compounds that could effectively treat diseases.
In terms of patent protection, London-based DeepMind filed three PCT international applications with the same title “Machine Learning for Determining Protein Structures” on September 16, 2019, claiming priority from the same three US provisional applications filed in September and November. 2018.
Provisional claims in the United States:
N ° 62/734 757 filed on September 21, 2018
N ° 62/734 773 filed on September 21, 2018
N ° 62/770 490 filed on November 21, 2018
WO2020 / 058174 includes claims on a prediction method, system and computer storage media. Claim 1 is as follows.
A method performed by one or more data processing apparatuses to determine a final predicted structure of a given protein, wherein the given protein comprises an amino acid sequence, wherein a predicted structure of the given protein is defined by values of a plurality of structure parameters, the method comprising:
generating a plurality of predicted structures of the given protein, wherein generating a predicted structure of the given protein comprises:
obtaining initial values of the plurality of structure parameters defining the predicted structure;
updating the initial values of the plurality of structure parameters, comprising, on each of a plurality of update iterations:
determining a quality score characterizing a predicted quality of the structure defined by the current values of the structure parameters, in which the quality score is based on the respective outputs of one or more scores
neural networks which are each configured to process: (i) current values of structural parameters, (ii) a representation of the amino acid sequence of the given protein, or (iii) both; and
for one or more of the plurality of structure parameters:
determining a gradient of the quality score with respect to the current value of the structure parameter; and
updating the current value of the structure parameter using the gradient of the quality score relative to the current value of the structure parameter; and determining the predicted structure of the given protein to be defined by the current values of the plurality of structure parameters after a final update iteration of the plurality of update iterations; and
selecting a particular predicted structure of the given protein as the final predicted structure of the given protein.
The prediction method of claim 1 generates multiple predicted structures of a given protein, performs certain calculations and, upon completion, selects a particular predicted structure of the given protein as the final predicted structure. The calculations consist of obtaining initial values of structural parameters defining the predicted structure and updating the values. The update process includes the following determination process using neural networks (emphasis added):
“determining a quality score characterizing a predicted quality of the structure defined by the current values of the structure parameters, in which the quality score is based on the respective outputs of one or more scoring neural networks which are each configured to process: (i) current values of structural parameters, (ii) a representation of the amino acid sequence of the given protein, or (iii) both“
Claim 1 therefore sets out the general functions of neural networks, but does not cite any specific architecture of neural networks. So, like Ed Garlepp’s discussion of unique disclosure issues with AI, the neural network is treated more as a ‘black box’ in the claim, although DeepMind was presumably working to develop new ones. network architectures. This claim is a good example of the balance patent practitioners need when drafting claims involving a neural network.
We note that the PCT application was filed long before DeepMind conducted further studies in CASP14, faced with the challenge of modeling various structures of unknown proteins provided in May-August 2020. During the pandemic, the team worked on the prediction of the structure of SARS-CoV-2 Orf8, one of the proteins of the coronavirus. Given the serious circumstances, DeepMind shared the findings and published the results as they were obtained. The patent strategy at DeepMind may have evolved into an open strategy as a result of such work, resulting in the recent release of details of their technology, with the source code being made available under an open source license.
We look forward to following the continuation of this patent as well as the general evolution of this technology.
1 Jumper, J. et al. Highly accurate prediction of protein structure with AlphaFold. Nature https://doi.org/10.1038/s41586-021-03819-2 (2021).
2 Identifier., Abstract.
4 DeepMind (2020). AlphaFold: the achievement of a scientific breakthrough [Video]. Youtube. https://www.youtube.com/watch?v=gg7WjuFs8F4
5 Jumper, J. et al. Highly accurate prediction of protein structure with AlphaFold. Nature https://doi.org/10.1038/s41586-021-03819-2 (2021).
6 Mr. Baek et al., Science 10.1126 / science.abj8754 (2021).
The content of this article is intended to provide a general guide on the subject. Specialist advice should be sought regarding your particular situation.