A recent article in Quanta magazine explores how scientists have been attempting to predict the future of the pandemic by understanding the constraints on the virus’s evolution.
The genetic material of any organism is like a computer program written in sequences of four types of either RNA (for many viruses) or DNA bases (for all other species). Changing the sequence, or genotype, of base pairs changes the program: in general, small changes will have small or no effects. These changes, which biologists call mutations, occur frequently and almost randomly in nature. (There are several caveats to this claim, and sometimes even a single mutation can have huge effects.)
When a virus has enough mutations to change its behaviour and shape (the phenotype), it becomes a new variant or strain. Mutations happen because of random, independent events, and in general, this means that several significant mutations are unlikely to occur at once in a single generation. For a new variant to emerge from the original strain, the virus must accumulate the right mutations over several generations.
This alone doesn’t limit the space of possible variants, although it slows down large changes. The more important restriction comes from natural selection. Natural selection compels the virus to gather its mutations in a specific sequence; in other words, the order of mutations matters. This is because some viral genotypes could leave the virus impaired and unable to function or replicate efficiently in their hosts – that is, some genotypes have low evolutionary fitness. If the virus moves into such a genotype “on its way” to becoming the next variant of concern, then the chain of mutations is likely to be broken simply because this strain of the virus dies out.
One of the first people to formulate a way of thinking about whether evolution can deliver an organism to a particular spot in genotype space was the American biologist Sewall Wright. Wright re-imagined the problem in terms of a “fitness landscape” and reduced it to three dimensions to make it visualizable.
The basic idea is that variation is represented on the horizontal axes (this could be either genotypic or phenotypic variation), and the vertical axis – the height – tells you how well a virus that is characterised by the values on the axes at any point will do in the real world. When you zoom out, you can expect to be looking at a picture of a landscape, with peaks and valleys, and perhaps the occasional canyon where a small mutation has led to a large change in fitness.
If you were to drop a swarm of viruses over some region of the landscape, over time, natural selection would leave only the viruses that climbed to the top of a peak – those with the highest relative fitness. Depending on where you start, this doesn’t have to be the highest peak (and often isn’t, in practice); it may just be the peak that is nearest or easiest to climb. Our swarm of viruses usually can’t get to a higher peak by crossing a valley (this is why the order of mutations matters) because they would struggle to survive at all once they’re actually in the valley.
Sewall Wright invented fitness landscapes to think about evolution, but they didn’t just remain biological tools. Fitness landscapes as a conceptual aid show up where the problem involves a large or high-dimensional space and only a few correct answers. The idea appears everywhere from the social sciences to string theory/cosmology, and is arguably one of the most powerful tools in science to conceive of problems with large spaces.
In principle, the landscape idea does the trick: you need as many dimensions as there are bases that can mutate, but once you’ve got those, and a way of experimentally testing or even predicting fitness, you can predict evolution by exploring the connections between peaks. So where’s the catch?
In comparison to humans, viruses have tiny genomes. SARS-CoV 2, for instance, has about 30,000 RNA bases. By contrast, humans have about 3 billion DNA bases. While the number of base pairs doesn’t fully determine an organism’s complexity – onions have about 14 billion more base pairs than humans – the limited size of a viral genome still means that they must be relatively simple.
But even viruses aren’t simple enough to make the problem fully solvable by a fitness landscape. 30,000 bases translates into nearly a quintillion (ten to the seventeenth power) possibilities. Even for computers, that’s hard. Add to that the difficulty of predicting the fitness of a genotype and the complications of testing this in the lab, and it’s easy to see why biologists have largely used fitness landscapes as a metaphor rather than a quantitative tool. This may now be changing with increased computational power and machine learning, and some argue that the fitness landscape is making a comeback. One way in which using the landscape has arguably worked is for the smaller problem of understanding the stability of mutations in the part of the coronavirus spike protein that binds to human lung receptors.
The ubiquity and reach of the fitness landscape doesn’t mean that it’s without criticism, however. The idea has been criticised for being misleading: three-dimensional intuition doesn’t easily translate to high dimensional problems. While the landscape conveys some basic intuition, it is possible that the idea of peaks and valleys does not represent what really happens in higher dimensions. Some have argued that the problem is better represented by a network, with nodes representing genotypes and edges the mutational paths between them. Of course, the notion that the landscape is static is wrong too: the environment that determines the fitness of a genotype is constantly changing (vaccines and widespread immunity may have changed the fitness landscape of the coronavirus, for example).
Several alternatives to the fitness landscape have been proposed, and the criticisms and caveats to the model mean that the landscape is an imperfect but valuable conceptual aid to convey basic intuition. Evolution is highly complex, and it may never be possible to predict it. While science has made significant progress in anticipating the routes the virus’s evolution could take, the future of the pandemic is still unpredictable.
Image: Thomas Shafee / CC by 4.0 via Wikimedia Commons