Proteins are not just nutrients in food. Its about long and complex molecules, made up of amino acids, which carry out most of the functions inside cells, such as the replication of genetic material, obtaining energy or signaling all the essential routes for their functioning. One of the most important characteristics of proteins is that their function depends on how they fold: their physical-chemical properties make them acquire a three-dimensional structure, without which they cannot function.
For this reason, for 50 years one of the most important challenges in Biology has been precisely to find out how proteins fold, which has infinite applications in basic research, in industrial processes and in the field of drug development: for example , learn how the coronavirus protein S folds it is crucial for designing vaccines or other drugs.
Today around 180 million proteins are known, but it has only been possible to find out how 170,000 of them fold. This has been achieved by means of experimental techniques, such as X-ray crystallography, which “freezes” and crystallizes them, to resolve their structure with photons, or magnetic resonance imaging. But, as this work is very complicated and some proteins “resist”, we are also working on models and approximations of bioinformatics to predict protein folding starting from its amino acid sequence. Unfortunately, there are so many possibilities and difficulties in each protein that so far no great development has been achieved in the field.
All of this seems to have changed. This week, DeepMind, a Google subsidiary that has created Artificial Intelligence (AI) systems capable of learning and winning at chess, the go or in shooting video games, has developed another AI, named AlphaFold, capable of predicting the three-dimensional structure of proteins. Specifically, it has achieved 92% accuracy in this task, in a biannual meeting aimed at testing bioinformatic models, and called CASP, of « Critical Assessment of protein Structure Prediction». Their results have been announced this week, but have not yet been published in a peer-reviewed scientific journal.
“We have been stuck with this problem – that of how proteins fold – for almost 50 years,” he explained in a release John Moult, president and co-founder of CASP, and researcher at the University of Maryland (USA). “Seeing how DeepMind has created a solution for this … is a very special moment.”
This is not a minor advance. As reported by « Nature.com», To be able to predict the structure of a protein, from its amino acid sequence it would be a huge leap for life sciences and medicine. It would greatly accelerate efforts to understand the basic building blocks of life and make research for new drugs faster and more advanced. As he said Demis Hassabis, CEO of DeepMind: “I think this is the most significant thing we have accomplished, in terms of the impact it will have in the real world.”
“It is a breakthrough of the first order, without a doubt one of the most important scientific results that I have witnessed in my life,” he commented for “Nature” Mohammed AlQuraishi, a computational biologist from Columbia University, and a CASP participant. So much so that, having solved the fundamental problem, he has said that many groups will dedicate themselves to something else.
An ancient search
In 1972, the Nobel Prize in Chemistry Christian Anfinsen postulated that the structure of a protein is completely determined by its amino acid sequence. But in 1969, Cyrus Levinthal predicted that it would take longer than the age of the universe to list all the possible configurations of a typical protein, by means of calculations – he predicted that a typical protein has 10 ^ 300 conformations. Interestingly, despite that number of configurations, proteins fold as they are produced, in ribosomes, in a matter of a few milliseconds.
In 1994, CASP was founded to accelerate research and pool progress. Since then, every two years, at this meeting recently determined proteins have been selected by experimental methods, to test predictive bioinformatic models, without the developers being able to know what the structure of the analyzed protein was. To measure the success of their efforts, a measure, known as GDT, was developed (Global Distance Test, in English), whose score ranges from zero to 100: those scores close to 90 are usually considered as a competent estimate.
At the eleventh CASP meeting, AlphaFold artificial intelligence achieved an average GDT score of 92.4, after working with various proteins. This precision means that the AI has made a mistake comparable to the width of an atom or 0.1 nanometers, when a nanometer is one millionth of a millimeter.
A real revolution
“These results open the door for biologists to use computational prediction of the structure as a nuclear tool in scientific research,” they explained from DeepMind. ‘Our methods may be especially useful for important types of proteins, such as membrane proteins – those that lie in the lipid bilayer that separates the inside of the cells from the outside environment – which are especially difficult to crystallize and therefore to determine experimentally.
“This computational work is a wonderful advance in the problem of protein folding, a great challenge for Biology for 50 years,” he explained Venki ramakrishnan, president of the Royal Society. And it happened decades earlier than many had predicted. It will be very exciting see the many ways this fundamentally changes biological research».
“What the DeepMind team has managed to do is fantastic and will change the future of structural biology and protein research,” he explained in ” Sciencemag.org» Janet Thornton, emeritus director of the European Institute of Bioinformatics.
To achieve these results, the DeepMind team has spent four years working on creating and training a neural network capable of processing “spatial graphs”, which represent the folding of proteins and the relationships of their amino acid residues. Such an AI system learns to refine these graphs using related sequences, multiple sequence alignments, and other representations.
After multiple iterations, the system learned to make predictions about the physical structure with great precision. For this, it was necessary to train it with the sequence of the 170,000 known proteins and also take advantage of large databases of unknown proteins, using weeks of computation in a number of processors comparable to 100 or 200 GPUs or graphic processing units. It was also necessary to create «tension algorithms», Which made it possible to connect small groups of amino acids, before forming a set, as if it were a puzzle solved from small groups.
Now, DeepMind researchers are working to publish their progress in a scientific journal and to discover new ways to facilitate access to this large-scale tool. Furthermore, they aim to find out how these predictions can contribute to the study of certain diseases, to facilitate the development of medicines and complement existing experimental methods.
However, AlphaFold is still not perfect. For example, it has problems with structures formed by the repetition of small segments, and it still cannot study complexes of several proteins that play their role together in cells.
A promising future
This AI is not just a powerful “telescope” with which to delve into the unknown universe of millions of proteins whose structure has not been revealed. Looking to the future, DeepMind researchers have commented that this tool can be useful in responding to future pandemics, since AlphaFold was able to successfully predict the structures of ORF8 and ORF3a, two SARS-CoV-2 proteins.
Still further, they have suggested that AI could be helpful in studying how proteins interact with DNA, RNA or other molecules.
«Systems like AlphaFold demonstrate the incredible potential of AIs as a tool to enable fundamental discoveries»They have concluded. «(…) There are many aspects of our universe that are unknown. The breakthrough announced now gives us more confidence that AI will become one of the most useful tools for expanding the frontiers of scientific knowledge. ‘