DeepMind has developed AlphaFold, an AI tool to help in predicting various protein structures.
Proteins are the building blocks of the human body and determine everything that happens within a cell. ‘Structure is function’ is an axiom in molecular biology and that applies to proteins as well. Understanding the 3D structure of proteins can help in understanding their function. Proteins are made up of amino acids that are arranged in a particular sequence and based on the interactions of the amino acids with each other, proteins form various conformations and shapes that determine their functioning.
The human proteome, and those of other organisms, contains proteins with multiple domains that fold independently and semi-independently. Human cells also contain molecules made of multiple chains of interacting proteins, such as receptors on cell membranes. The ability to accurately predict protein structures from their amino-acid sequence would greatly benefit life sciences and medicine. It would accelerate efforts to understand the building blocks of cells and enable quicker and more advanced drug discovery.
Previously researchers used experimental techniques like X-Ray Crystallography and Cryo-Electron Microscopy to determine the structure of proteins. These techniques were successful in determining the structures and functions of around 100,000 unique proteins which represents only a small fraction of the billions of the available protein sequences. The major downside is that these techniques are time-consuming and expensive. It is also hard to analyse some proteins using these techniques. Early attempts to use computers to predict protein structures in the 1980s and 1990s did not yield many results. The published methods were deemed ineffective and inaccurate when other scientists applied them.
John Moult, a computational biologist from the University of Maryland founded the biennial protein challenge called the Critical Assessment of Structural Prediction (CASP) in 1994. CASP is a competition, where various teams are challenged to design computer programs that predict protein structures in an effort to find out and improve on such computational technologies. An AI tool, AlphaFold, developed by DeepMind (an AI network developed by Google), outperformed 100 other teams in this competition and it was able to predict the structures of several proteins of the human proteome, malarial parasites, mice, maize, and 20 other model organisms.
AlphaFold uses two distinct techniques to improve its prediction efficiency. The first step involves a deep learning bioinformatic technique called multiple sequence alignment. A protein’s sequence is compared with similar ones in a database to reveal pairs of amino acids that are arranged randomly. DeepMind uses deep learning for this step to analyse such pairings and predict the distance between two paired amino acids in the folded protein. In the second step, AlphaFold created a folding arrangement for a sequence. It uses an optimization method called gradient descent to refine the structure so it comes close to the predictions from the first step. With this added efficiency, the DeepMind team set out to predict the structures of nearly every known protein encoded by the human genome and were able to generate structures within a few hours.
Many researchers have cross-referenced the results provided by AlphaMind with traditional research methods and found out that the structure’s prediction was on par with their findings. On 15th July, 2021, DeepMind released an open-source version of AlphaFold called AlphaFold2 allowing the scientific community to build up on the already powerful tool and help to predict more protein structures. The open-source version was found to be 16 times faster and it can generate structures within minutes depending on the size of the protein. The structures are available in a database maintained by the European Molecular Biology Laboratory European Bioinformatics Institute.
AlphaFold has predicted around 98.5% of known human proteins and in other organisms with 58% accuracy on the position of the amino acids and 36% accuracy for the atomic features of the protein such as the active site (which is necessary for drug design). The technology has predicted the structure of approximately 365,000 human proteins in a week which should swell to 130 million structures (nearly half of all known proteins) in a year.
Determining how individual proteins interact with other cellular players is one of the greatest challenges to the AlphaFold predictions. Researchers are already using AlphaFold and related tools to help make sense out of experimental data generated using X-ray crystallography and cryo-electron microscopy. Although experimental evidence is required to predict the proteins, AlphaFold helps to reduce the tedious process of predicting the structure and sequence of the various proteins in the human proteome.