In December 2020, the Google-owned artificial intelligence (AI) company DeepMind presented a major breakthrough in protein folding. With time, this can be used as a molecule design tool that will contribute to revolutionizing medicine. However, DeepMind’s AI research method may be even more important, since it challenges the core of how we carry out research. The question is whether the research community is ready to unleash this potential.
Protein folding is prediction of the three-dimensional structure of a protein based on the protein’s amino acid sequence. This is considered to be one of biology’s many holy grails. The shape and structure of proteins are crucial to most of the processes of the human body and other biological phenomena. The protein structure is used when developing industrial enzymes for food production, wastewater treatment and drugs because it aids our understanding of how small molecules and proteins interact.
Every two years since 1994, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition has taken the temperature of the research community’s ability to fold proteins using a computer. Until 2018, progress had been steady but still a long way from even a partial solution.
In 2018, DeepMind participated for the first time with its AlphaFold software. It impressed the research world with progress equivalent to what could normally be expected in 10 years. In 2020, as many of us had expected, this became even more impressive, when DeepMind’s updated AlphaFold 2 software, according to CASP’s criteria, solved the protein-folding problem.
But why is a Google-owned company cracking this biological problem and not a respected research institution? The answer is actually quite simple: Google has precisely the right prerequisites. The research community does not yet, at least not on a sufficient scale. Google might be able to use the insight that comes with pushing AI to its limits elsewhere, but the main value for Google is the respect that accompanies this scientific breakthrough.
Also a data issue
What is so interesting and innovative about DeepMind’s approach? Clearly, having a deep understanding of AI helps in knowing which problems to work on in the first place.
DeepMind started working on protein folding in 2016 after its AlphaGo program beat the Go world champion Lee Sedol. For protein folding, biologists have sworn by Nobel Laureate Christian Anfinsen’s hypothesis that the sequence of the protein’s building blocks – the amino acids – largely determines the protein’s three-dimensional structure. If this is true, then creating a statistical model that predicts structure based on sequence makes sense.
So the first rule is to target problems that present a well-defined learning problem. However, a problem that is suitable to be solved using AI must have another characteristic: access to huge quantities of training data.
In games such as Go, data can be collected by letting AI play against AI. In protein folding, we have protein structure databases with lots of training data because biologists and chemists have experimentally determined protein structures by using crystallography and other techniques over the past 50 years.
With a well-defined learning problem and plenty of data, deep learning can take advantage of the major breakthroughs over the past 10 years in computing power and flexible statistical models.
What is a good deep learner?
DeepMind’s solution to protein folding has been 4 years in the making under the visionary leadership of Demis Hassabis. However, knowing when to takes second place to how to use AI.
Every time DeepMind enters a new field, a small team carries out a pilot study. If this shows promising results, DeepMind creates a multidisciplinary team that can attack the problem. A core of about 15 researchers has carried out the work on AlphaFold, and about as many have assisted along the way. Similar to experimental science, background knowledge and the intermediate results inspire the next steps in the discovery process.
Good deep learners are systematic experimentalists who intuitively understand what buttons to press. Instead of performing the experiments in a traditional laboratory, statistical models with millions of parameters are trained on large data sets with thousands of graphics processing unit (GPU) computers.
The natural and health sciences are fortunately full of well-defined learning problems, and in Denmark we are awash in health data and many other types of AI-suitable data, so if we can also learn from DeepMind’s collaborative methods, the research world may create the next big AI breakthrough.
Others may run off with all the prizes
AI will undoubtedly accelerate progress in many fields. I recently received a substantial grant along with five other researchers to carry out methodological research in machine learning. The collaboration between researchers in machine learning, biology, physics and modelling provides important inspiration on where we are headed. We hope that our methodological contribution can create new breakthroughs in the use of AI.
On the genomic level, a disease such as breast cancer varies between patients. Modelling large volumes of data can be very important to design more precise and effective individual treatments. Data can also help us to better characterize cancer, for example, and link these characteristics with diagnosis and treatment. One obstacle to progress has been being overwhelmed by the natural genomic variation, which creates difficulty in identifying the mutations that lead to disease. We are striving to develop more precise models that can learn across data sources, not merely making predictions but also assessing the uncertainty of these predictions.
However, development is rapid and the competition to create the best methods is fierce. The AlphaFold example, in which DeepMind and Google took the preliminary prize, shows that understanding where the potential is and how to manipulate AI is important for success. I am therefore concerned about whether the wider research community is ready for AI or whether we will see a few organizations that are very strong at AI run off with all the prizes.
AI cannot solve all problems, and in many branches of science, researchers’ deep knowledge cannot be replaced by a group of deep learning experts. However, just as experimental technologies such as gene sequencing today have set the framework for both basic research and COVID-19 testing, AI will increasingly affect how we carry out research. The time is now ripe for other experimental sciences to start thinking of AI as a component in their technological toolbox.