The Genetic Time Machine: Neanderthals, genomics and hybridization priors
200 thousand years of isolation is not sufficient for genetic incompatibilities to emerge.
Written by Razib Khan.
The late evolutionary biologist John Maynard Smith pioneered the use of game theory in biology, bringing him into contact with economists who worked within the tradition pioneered by John von Neumann and Oskar Morgenstern. Contrasting evolutionary and economic game theory, Maynard Smith said he preferred his field and the concreteness of genes and fitness to constructs like “utility.”
Yet the same contrast can be brought to bear within evolutionary science itself – Maynard Smith’s advisor and mentor J. B. S. Haldane famously admitted that fitness is a “bugger,” and in the 1970s the late biologist Richard Lewontin asserted that population genetics was a vast theoretical machine with very little empirical data to mine. Things have changed in the last generation thanks to the emergence of genomics – which combines modern laboratory automation, advanced chemistry and computational data crunching to allow for relatively easy read-outs of the genome. This, in turn, permits inferences about evolutionary processes that are broadly understood in a much more precise and granular context.
Evolutionary biology preceded genetics as a mature field, but without a coherent theory of heredity Charles Darwin’s ideas were mired in dispute and faction, and his singular focus on natural selection had fallen somewhat out of favour by the late 19th century. It was the rediscovery of Mendel’s framework of discrete genes in the early 20th century that revived evolutionary biology, eventually leading to the Neo-Darwinian synthesis, a major foundation of which was the population genetic theories pioneered by R. A. Fisher, Sewall Wright and J. B. S. Haldane. With population genetics, evolution became mathematically formalized, and debates could finally be settled with Gottfried Wilhelm Leibniz’s maxim “let us calculate.” At least that was the case after the biophysical substrate of Mendel’s genes, DNA, was elucidated in the 1950s – and the multi-decade sequence of advancements in molecular biology and computation led to genomics.
Darwin’s original conviction of the importance of adaptation driven by natural selection on heritable variation still motivates biologists today, but the most powerful tools now involve the statistical enumeration of the pattern of billions of A’s, C’s, G’s and T’s. In the case of the human genome that means 3 billion letters in sequence with numerous patterns defined by hundreds of millions of variations from the species average (each human has 5-6 million base-level variants). Today, in 2024, Lewontin’s assessment no longer holds; the vast machinery of theoretical genetics now works upon mountains of data that come rushing out faster than the calculations can be done. In addition to the miracle of genomics, the 21st century has gifted us paleogenetics, the new discipline of ancient DNA. Geneticists can not only use massive quantities of data to structure phylogenetic trees, but can also solidify and anchor the deep internal nodes of these trees with samples from the past; a literal genetic time machine.
These advances in genomic science can be brought to bear in numerous fields, from forensic identification to medical diagnosis. But it is evolutionary biology where they have yielded the largest short-term bonanza. Here they can reveal signatures of adaptation and demography with digital precision, and reconstruct phylogenetic relationships with a granularity previously unimagined.
For nearly a century evolutionary biologists, mostly paleoanthropologists, have debated the relationship of Neanderthals to our own lineage of modern humans. Though fossils are coarsely informative, at the level of evolutionary closeness found among hominins the implications were not always clear or well agreed upon. But in 2010 DNA from Neanderthals was retrieved and sequenced. Looking at the 3 billion base pairs across the Neanderthal genome, it was immediately evident that about 60 million bases of individual non-Africans derived from Neanderthals. With more Neanderthal genomes over the last decade and many ancient modern human samples, it has become more and more clear that some of our ancestry is Neanderthal.
When two different lineages mix, one of the genetic tells is the presence of long segments of distinct ancestry within the genome. Over time genetic recombination breaks apart these associations, and the length of distinct ancestry segments became shorter and shorter. Very short segments point to admixture very far in the past. Using copious modern genomes, statistical geneticists were able to infer that the mixing of Neanderthals into our lineage happened about 40–70 thousand years ago. More recently, using modern human genomes from 40–45 thousand years ago it became clear that the mixing event happened 50-55 thousand years ago. Conveniently, this correlates with the explosion of the Initial Upper Paleolithic Technology across Eurasia.
But wait, there’s more! With thousands of human ancient DNA samples, scholars began to notice a subtle pattern of decreased Neanderthal admixture over the last 50 thousand years. In other words, the modern human samples from 40 thousand years ago had more Neanderthal ancestry than modern humans of today (or 10 thousand years ago). One possibility is that Neanderthal-admixed Eurasians received genes from modern humans from Africa or the Middle East that did not have Neanderthal ancestry, therefore reducing the proportion of our heritage from our prehistoric cousins. This is likely part of the answer, but not all of it. There is another dynamic at work: natural selection at the genomic level. More precisely, there has been selection against Neanderthal-origin DNA in our genomes. The 2% that non-Africans have is what remains after a larger proportion initially mixed into our lineage.
Why would this occur? Neanderthals and modern humans share common ancestry, but it dates to more than 500 thousand years ago. Though there were some genetic interactions in the period between 50 and 500 thousand years ago, the two lineages were mostly separated. This means that natural selection and genetic drift operated on proto-Neanderthals and proto-moderns independently, and over time genetic networks between the two populations may have diverged to the point in which incompatibilities emerged. There just wasn’t enough time for total hybrid sterility between Neanderthals and moderns (jackals diverged from wolves 1–2 million years ago and can still hybridize). But even if the two lineages can produce hybrids, that does not mean they are as viable or fit. More precisely, in the case of Neanderthal-human admixtures, the hybridization event was fraught with incompatibilities reflected in the distribution of Neanderthal genes in modern humans.
Though only about 2% of the genome of any non-African is from Neanderthals, it is not the same 2%. About 35% of the whole genome of the Neanderthals can be reconstructed from modern humans, which means that 65% of the Neanderthal genome was lost through the admixture and extinction (we know 100% of the Neanderthal genome because we now have many ancient Neanderthal genomes). Why don’t we carry 65% of the Neanderthal genome? First, only a small minority of the original admixed population’s ancestry was Neanderthal. Second, genes are lost through drift. Finally, genes are lost through negative, purifying selection. The last point is critical, as the original admixed population may have been closer to 4% Neanderthal.
Neanderthal genes are not distributed randomly and evenly across the modern human genome. Rather, there are “Neanderthal gene deserts,” such as around the gene FOXP2, implicated in language development. There is also 20% as much Neanderthal ancestry on the X chromosome as on the other 22 chromosomes (the autosomes). The evolutionary genetic reason for this is somewhat involved, but the X chromosome is deeply implicated in Dobzhansky-Muller Incompatibilities, whereby gene networks that impact viability are exposed to natural selection. Neanderthal DNA is also more often found in intergenic DNA, the proportion that is “silent” or “junk” and is not translated into RNA and does not code for proteins. And, Neanderthal ancestry is less present around essential regulatory regions of the genome. Though there are many Neanderthal genes that resulted in adaptations through introgression, on the whole, if it wasn’t neutral in effect it was bad.
Neanderthal genes often did not “work well” with human genes, and that impacted their fitness and distribution across the genome. This is totally comprehensible with 20th-century theory, but in the 21st century, we have data. Because this result hinges on finite samples of ancient DNA, a more tentative but likely conclusion is that most of the purifying selection happened in the first 10 generations after admixture, about 250 years assuming 25 years per generation.
Of course, the same logic that applies to Neanderthals applies to humans at other scales of divergence. All non-African humans seem to share common ancestry about 60 thousand years ago. The best estimates for the deepest branches of modern human populations with some level of coherent continuity down to the present are the Khoisan-speaking foragers of the Kalahari, the San Bushman, who may have diverged as early as 200 thousand years ago from other human populations. Just as modern humans had small proportions of Neanderthal ancestry, so San Bushman have small proportions of other African and Eurasian ancestry. Similarly, the deeply distinctive Pygmy populations of the Congo rainforest have minority Bantu ancestry.
The methods discussed above for Neanderthals are entirely applicable to distinct modern lineages and can be used to explore systematic and consistent incompatibilities. To my knowledge, no such result has been found. In 2014 I specifically asked a postdoctoral fellow in David Reich’s laboratory if they had looked to see if the pattern with Neanderthal introgression could be seen more modestly in the African forager genomes that they had early access to through the Simons Foundation, and was told they had detected no such signature.
What does this mean? It would imply that 200 thousand years of isolation is not sufficient for genetic incompatibilities to emerge in humans, though 500 thousand years or so is. The reality is I now accept models that imply the Khoisan divergence from other human populations is closer to 100 thousand years as more likely, making it much more plausible that reproductive incompatibilities would not have evolved. If no such signature exists between African foragers and other humans, then the probability that such a signature exists between West Africans and Eurasians, or West Eurasians and East Eurasians, seems rather unlikely.
This does not mean that there are no biological or genetic incompatibilities between populations that might not emerge. The Basque people of Spain have the highest Rh- frequencies in the world, and until modern medicine, they had relatively high miscarriage rates. As the vast majority of non-Basques were Rh+, this meant there was a heightened miscarriage between Basques and non-Basques. But Rh is not the only blood group where incompatibilities are relevant. West African populations and European populations have different frequencies of O, A and B. Looking at total European ancestry in African Americans, the representation of A and B in that population seems somewhat low. Medical geneticists have hypothesized this is due to higher past miscarriage rates among African women, who were more likely to be O, were carrying the children of European men, and were more likely to be A and B. Blood group O is the antigen-negative state, while A and B are new mutations with different antigens. Of course, these are not specifically racial dynamics. European women with blood group O will have the same issues as European men with blood groups A and B (the miscarriage rate for humans is quite high, with a consensus value in the range of 50%).
This brings me to the famous Helgasson 2008 study, An association between the kinship and fertility of human couples. The key finding is that third-fourth cousins are the most fertile, and that this “observation of highly significant differences in the fertility of couples separated by very fine intervals of kinship” led them to conclude that “this association is likely to have a biological basis.” I asked a friend who has worked more recently with Icelandic genomic data about that conclusion, and his intuition was the same as mine: the genetic differences between this kinship group and those further out are minimal, so it is likely not a biological dynamic. The Icelanders are a unique population: genetically about 25% or more Irish, and very homogeneous because they are on a small island. Hence they are perfect guinea pigs to pick out signals in the genome (though this is far less relevant in 2024 with copious whole-genome data across millions of individuals). That being said, the idea that the minuscule genome-wide differences between 4th and 6th cousins matter (0.3% vs. 0.05% of the genome) seems hard to believe.
So what’s going on? The result is certainly real and repeated.
I think to understand the dynamic one has to look at another phenomenon that re-occurs in many human societies: cousin-marriage. Though today cousin marriage is strongly associated with Islamic societies, it has been prevalent across most stratified complex civilizations to various degrees. Suppressed by the Roman Catholic Church in Western Europe, it reemerged among Northern Europe’s Protestant elite after the Reformation. L. L. Cavalli-Sforza’s data from Italy shows that cousin marriage is more common in areas that are rural and isolated, basically, where there are fewer marriage candidates. But it also occurs in higher social strata and serves to combine fortunes between related families. In patriarchal societies women marrying their cousins are somewhat reassured by the fact that the people whose homes they will be joining are somehow related to them as well, dampening alienation and the possibility of exploitation.
And yet the genetic consequences of first cousin marriage are well known. Cousins share 1/8th of their genome, 12.5%. Recessive diseases arise from these pairings, and polygenic characteristics like IQ also suffer. The Habsburgs are the most famous case of an elite lineage whose predilection for consanguinity led them to a literal genetic dead end. But this does not always happen. Charles Darwin and Emma Wedgewood were first cousins, and they had ten children; to this day they have copious descendants. The intermarriages between liberal intellectual elite families like the Keynes, Darwins and Wedgewoods were culturally fruitful and fecund.
As for why individuals who are somewhat closer on the “social graph” (and therefore genealogical graph) produce more children, it seems cultural, social and interpersonal dynamics offer a simple explanation that doesn’t depend on implausible biological processes at extremely close degrees of relation. The ancient DNA makes it clear that the human past was defined by massive turnover, admixture, and periodic mass migrations of males. Biology is the science of exceptions, but we should not have as a prior an equilibrium state of organically developing kinship networks that persist for 1,000 years on an isolated island or peninsula. Over the long term, these societies were always admixed and assimilated, so their local adaptations and equilibrium would be ephemeral.
Razib Khan writes a Substack and is CXO of GenRAIT. He has written for UnHerd, Quillette and Palladium.
Support Aporia with a $6 monthly subscription and follow us on Twitter.
My DNA test says I am 8 percent Neanderthal, I shall require reparations and land statements from the following list of Eurasian nation-states
great stuff as always Razib