Nature has now released that AlphaFold 2 paper, after eight long months of waiting. The main text reports more or less what we have known for nearly a year, with some added tidbits, although it is accompanied by a painstaking description of the architecture in the supplementary information. Perhaps more importantly, the authors have released the entirety of the code, including all details to run the pipeline, on Github. And there is no small print this time: you can run inference on any protein (I’ve checked!).

Have you not heard the news? Let me refresh your memory. In November 2020, a team of AI scientists from Google DeepMind indisputably won the 14th Critical Assessment of Structural Prediction competition, a biennial blind test where computational biologists try to predict the structure of several proteins whose structure has been determined experimentally but not publicly released. Their results were so astounding, and the problem so central to biology, that it took the entire world by surprise and left an entire discipline, computational biology, wondering what had just happened.

Now that the article is live, the excitement is palpable. We have 70+ pages of long-awaited answers, and several thousand lines of code that will, no doubt, become a fundamental part of computational biology. At the same time, however, we have many new questions. What is the secret sauce before the news splash, and why is it so effective? Is it a piece of code that the average user can actually run? What are AlphaFold 2’s shortcomings? And, most important of all, what will it mean for computational biology? And for all of us?

In this commentary, which aims to be a continuation of my blog post from eight months ago, I try to address some of these questions. First, I provide a bird’s eye overview of the AlphaFold 2 architecture. This is not meant to be a technical exposition (the SI is as detailed as you could wish, and even the code cites different sections of it), but focuses on the intuition behind the architecture. I want this to reach people without a background in either deep learning and bioinformatics who want to know what’s going on; and those who may have the right background, but want an overview of the full paper before diving right into it.

Following the cold, stone-hard facts, I give a completely personal assessment of the ideas behind the architecture. Namely, I explain which ideas I think were key to the success of AlphaFold 2, and speculate which factors made this team succeed where so many others have fallen short. I am a person of strong opinions, but nevertheless happy to declare that my thoughts may be going in completely the wrong direction. Still, I think the story of AlphaFold 2 raises a lot of questions that we have not addressed as a community and that deserve appropriate consideration somewhere.

Finally, I revisit some of the questions that I raised eight months ago. Some of these questions have been answered by the paper, or by the code (e.g. what are the limitations to run the code). Some others are not solved explicitly, but I have had a chance to reflect upon them more deeply and I think I have some novel insight. And some others are matters that have arisen from the new information, and that I think we will have to answer together.

I have promised myself that I will be more succinct this time — after all, in a few months I should be writing up my PhD thesis and I really don’t have much time to spare. Let’s see if I manage.

First act: how does AlphaFold 2 work?

Prelude

Until Thursday morning, the best answer we had was an image, published in DeepMind’s press release back in November. This schema made the rounds of the internet at the time, and has been featured in a multitude of conferences and discussion groups ever since. But, sadly, it was lacking in details, and even the most knowledgeable deep learning experts were only able to make educated guesses.

The Nature article provides a very similar, but slightly more detailed diagram that outlines the different pieces of the architecture.

Diagram of AlphaFold 2 as published in the official Nature paper in July 2021. We have added the separation red lines for convenience.

The overarching idea is quite simple, so I will try to sketch it in a few lines. If you are not familiar with deep learning, the following might sound slightly abstract, and that is perfectly fine. I will take you through the details later. For now, though, let us try to get a schematic picture of the network. For clarity, I have divided the image into thirds which represent the three main parts of the AlphaFold 2 system.

First of all, the AlphaFold 2 system uses the input amino acid sequence to query several databases of protein sequences, and constructs a multiple sequence alignment (MSA). Put simply, an MSA identifies similar, but not identical, sequences that have been identified in living organisms. This enables the determination of the parts of the sequence that are more likely to mutate, and allows us to detect correlations between them. AlphaFold 2 also tries to identify proteins that may have a similar structure to the input (“templates”), and constructs an initial representation of the structure, which it calls the “pair representation”. This is, in essence, a model of which amino acids are likely to be in contact with each other.

In the second part of the diagram, AlphaFold 2 takes the multiple sequence alignment and the templates, and passes them through a transformer. We will talk about what a transformer entails later, but for now you can understand it as an “oracle” that can quickly identify which pieces of information are more informative. The objective of this part is to refine the representations for both the MSA and the pair interactions, but also to iteratively exchange information between them. A better model of the MSA will improve the network’s characterization of the geometry, which simultaneously will help refine the model of the MSA. This process is organised in blocks that are repeated iteratively until a specified number of cycles (48 blocks in the published model).

This information is taken to the last part of the diagram: the structure module. This sophisticated piece of the pipeline takes the refined “MSA representation” and “pair representation”, and leverages them to construct a three-dimensional model of the structure. Unlike the previous state-of-the-art models, this network does not use any optimisation algorithm: it generates a static, final structure, in a single step. The end result is a long list of Cartesian coordinates representing the position of each atom of the protein, including side chains.

So, to recap: AlphaFold 2 finds similar sequences to the input, extracts the information using an especial neural network architecture, and then passes that information to another neural network that produces a structure.

One last piece is that the model works iteratively. After generating a final structure, it will take all the information (i.e. MSA representation, pair representation and predicted structure) and pass it back to the beginning of the Evoformer blocks, the second part of our diagram. This allows the model to refine its predictions, and also produce some funny videos that you can find in the article’s page.