Aporia Magazine

Twin Studies and the Heritability of IQ

Sep 2, 2025

Why the standard estimates are probably right.

30 Comments

Professor Steve Hsu, a well known quantum physicist at Michigan State, ran a lot of studies on the genetic influence of intelligence, both in the US and in China...He concluded that thousands of genes affect intelligence, and also that intelligence is roughly 80% heritable....That seems likely because good health and IQ are definitely correlated, and a great many genes affect health...

The Westering Sun

I think Gusev is being given too much credit. His approach essentially boils down to the cheating spouse caught in flagrante by their partner: 'Who are you going to believe, me or your lying eyes?'

It is based on two prejudices. 1. That intelligence heritability is low or non existent. 2. A rationalist faith which supposes reality must be transparent to rational-technical methods. If the molecular models don’t find heritability, then the heritability must not exist. But that reflects the limits of the model, not reality.

Heritability is a statistical property of populations, not a catalogue of specific genes. We will probably never know exactly what drives it. That doesn't mean it isn't as real or as powerful as people observe, or as Twin Studies suggests.

Magical Realist

Saying there is bias because of "rationalist faith" is pretty useless as such a rationalist could just say you are biased and call you a racist. So then everyone is biased

The Westering Sun

Yes, everyone is biased — the question is whether those biases are grounded in reality or in ideology. The assumption of high heritability reflects centuries of lived experience and mountains of observational data. The assumption of negligible heritability reflects liberal ideology, which demands equality at all costs. Not all biases are equal.

Excellent article.

"Here I add to that by arguing that current GWAS studies must be overlooking much of the genetic influence on intelligence. In short, intelligence must be affected by vast numbers of genes, which means that most of them must have very small effects, and current GWAS studies do not have the statistical power to detect these tiny effects."

I believe that is the correct analysis of the situation. The fact that human intelligence manifests as a spectrum shows that there is a large number of genes influencing intelligence.

To add a further confounding factor, intelligence presents itself in different abilities, such as abstraction, logic, reasoning, planning, creativity, critical thinking, and problem-solving.

It seems to me that 80% or more of intelligence is genetic.

Most GWAS do not rely on whole genome sequencing (WGS) but on SNP arrays, which cover only a small portion of the genome (typically 500,000–1 million SNPs), missing rare variants, structural variations, or regulatory regions. This limits explanatory power, as rare and non-coding variants (e.g., enhancers, promoters) are important for complex traits like IQ. Larger WGS-based databases (e.g., UK Biobank or All of Us project) are improving this, but we are still not dealing with millions of complete genomes.

Additive vs. non-additive (combinatorial) models: Traditional GWAS and PGS models primarily assume additive effects (i.e., genes contribute independently to the trait), ignoring epistasis (gene-gene interactions), dominance effects, and context-dependent regulation. Regulatory genes (e.g., transcription factors) are indeed context-dependent: a variant may have a positive effect in one genetic environment, a negative effect in another, or be neutral. This requires combinatorial logic, similar to poker, where the value lies not in individual cards but in their combinations (e.g., flush, straight). Newer models like GenoBoost or other non-additive PGS approaches attempt to incorporate these, improving predictive accuracy, but they are still far from complete. A significant portion of missing heritability may stem from epistasis and non-additive effects.An AI-based system could theoretically identify these “rules” based on millions of whole-genome projects, with educational attainment (as a proxy for intelligence) as the target variable. Current AI technologies (e.g., deep learning, graph neural networks, or transformer models) excel at discovering complex patterns in large datasets, including genomic data. For example, tools like AlphaGenome or similar AI systems already identify complex variant interactions from WGS data and build predictive models for diseases or traits.

In such a project, AI would discover interactions: It could use unsupervised learning (e.g., autoencoders) to decode the “grammar” of the genome or supervised models to predict educational attainment, incorporating epistasis and regulatory networks.

Advantages: With millions of whole genomes (e.g., 1–10 million samples), AI could learn combinatorial rules without overfitting, similar to how poker AIs (e.g., Pluribus) discover strategies.

Challenges:

Data requirements! The largest WGS databases (e.g., 100,000–500,000 samples) are not yet in the millions, though projects like Genomics England or AI-driven analyses are growing rapidly.

Computation: Analyzing billions of SNPs with interactions is exponentially costly, but quantum computing or optimized AI (e.g., sparse models) could help.

Ethical and scientific limitations: Educational attainment is not a perfect proxy for IQ (due to environmental influences), and AI would find correlations rather than causal rules. Additionally, there are risks of overfitting and bias (e.g., population-specific effects).

Likelihood: In the short term (5–10 years), AI could explain 20–30% of the variance, but the full “rule set” (80%+) is still far off, as the genome is too complex, and other factors (e.g., DNA methylation) also play a role.

Great article! I noticed this comment: "Indeed a more recent and bigger study analysed 3 million genomes and found 3,952 SNPs associated with educational attainment, which together account for 12 to 16% of the variance."

This is slightly misstated as these estimates refer to the predictive performance of a polygenic score (PGS) including weights from the standard panel of ~1.2M HapMap3 common variants (see Methods: Polygenic Prediction and Supplementary Table 3 of the study). Their method of training this PGS attempts to glean signal from the many truly associated SNPs for which low power prevented their reaching strict, genome-wide significance (p < 5e-8) unlike the smaller number of 3,952 lead SNPs. The same applies to the intelligence GWAS mentioned in the preceding paragraph.

If anything, this supports your general point that GWAS/PGS methods are too underpowered and data-limited to precisely estimate heritability to the same extent as do twin studies.

'We have no good reason to think that twin studies are severely underestimating the heritability of IQ'.

Shouldn't that read 'overestimating'?

Good point

—NC

Reflecting on this a week on: I wrote it in response to GWAS studies that were finding very low contributions to IQ from genes (e.g. the 2018 paper quoted in the piece that found genes amoutning only to: "up to 5.2% of the variance in intelligence"). I stand by the explanation in the piece for why that is likely to be wildly below the true genetic contribution.

It is then a fair criticism (as indeed made above by Sasha Gusev) that I didn't then go on to discuss newer methods that attempt to get round the limitations of previous GWAS studies (these include GREML-KIN, GREML-MS, Sib-regression and RDR). As best I can tell (not claiming to be at all expert in any of these) there is not yet a consensus in the literature about how to properly account for confounding factors when using these methods, and nor have sufficient such studies yet been done to arrive at reliable values for the heritability of IQ. However, it seems that this could be arrived at in coming years. In the meantime the estimates from twin studies likely remain the best understood and the most reliable.

The article argues that Genome-Wide Association Studies do not currently have sufficient statistical power to estimate the contribution of all genotyped common variants. Had the author read only a few sentences further in my article (https://theinfinitesimal.substack.com/p/no-intelligence-is-not-like-height) they would have found a resolution to this dilemma:

"But prediction accuracy depends on sample size, could the findings drastically change with more samples in the future? In fact, through the magic of statistics, we actually know that this claim will always to be true. We know this because we have estimated a parameter called molecular heritability, which tells us the upper bound on what a genetic predictor could ever achieve ... But for IQ, the direct heritability dropped to 15% (with a wide error bar) and for educational attainment all the way down to 4% (with a narrow error bar). These substantial decreases are the result of some mix of cultural influences, assortative mating, and population structure."

So we already had our answer: molecular methods can estimate the total GWAS heritability without being constrained by statistical power to identify individual effects, and these estimates are *also* much lower than those obtained by twin studies.

Though I wish the author had read the entirety of my article and saved themselves the trouble of writing a response to a point that was already addressed, I do applaud the effort. Noah Carl, whose piece is also cited here, appears to have read none of my article at all! His response immediately shifts to pedigree and adoption studies that were not even mentioned. Thankfully, Vinay Tummarakota at Unboxing Politics has recently done the heavy lifting of sifting through pedigree and adoption studies and demonstrated that Carl's non-response also happens to be incorrect on the merits: adoption and pedigree studies do not provide strong support for twin studies either (https://unboxingpolitics.substack.com/p/contra-scott-alexander-on-missing). I again encourage all interested parties to read carefully, as Tummarakota's piece also addresses concerns about rare and SNP-level variation in the section titled "Relatedness Disequilibrium Regression". Perhaps in the distant future someone around here will have actually read to the end and offered a counterpoint.

Sep 3Edited

We know that children resemble their parents. Adoption studies find that adopted children resemble their biological parents much more than their adoptive parents, and often that they don't resemble their adoptive parents at all. This seems like strong evidence that the resemblance between children and their parents is mostly due to genes.

—NC

Adoption studies show nothing of the kind. Adoption can raise the IQ of adoptees by up to 40 points. Your own work on this subject uses a few cherry-picked examples but doesn't engage the broader body of findings.

https://pmc.ncbi.nlm.nih.gov/articles/PMC5754247/

The 40 IQ points increase that you quote seems to be the very largest increase in the entire literature, pertaining to one young child who started off severely malnourished. It’s not at all typical. And no-one is claiming that environmental factors have no effect.

Sep 4Edited

It's not typical because adoption studies normally don't capture extreme change of environment, such as that between an environment where food is readily available and one where it isn't. But in order to understand the full range of environmental effects, you need to take such cases into account, rather than confining yourself to adoption cases where the difference in the environments before and after adoption is minor. Of course in the latter cases, you're going to find that environment doesn't make much of a difference. It's basically the same problem as exists with twin studies: what adoption studies count as genetic is actually the combined effect of genes and environment.

It comes down to what questions one is asking. One question could be: what happens if one adopts a child out of an environment of severe malnutrition and extreme neglect? Another question could be: what happens if one adopts a child out of a “typical poor parents” environment into a “typical middle-class family” environment. No-one is surprised that the bigger the change in the environment the bigger the effect on outcome. That’s obvious, accepted and understood. But, if we’re trying to understand society at large and factors that affect kids in general, then clearly we’re interested in childhood environments that are common and typical. Pointing to the most extreme case you can find is not a refutation of studies into what commonly and typically affects outcomes.

Hi Sasha, can you talk us through the method by which you arrive at an upper bound (regardless of statistical power) using GWAS studies? Or point me at somewhere where this is explained?

Sure, some of the methods are described here:

http://gusevlab.org/projects/hsq/#h.gg1hj8vdv5em

There are individual-level based estimators (typically referred to as GREML or GCTA) and there are estimators based on GWAS summary statistics (typically referred to LDSC). Both of these methods can either be applied to "population level" data to estimate the total proportion of the trait that can be explained by the genotyped SNPs or "family level" data to estimate the *direct* effects that can be explained by the genotyped SNPs (the distinction between population and direct effects is explained in the above link; direct effects are what people intuitively think of as "heritability"). Importantly, all of these methods provide unbiased estimates of the total contribution of all GWAS SNPs (and all other genetic variation they are correlated with) and neither require nor rely on individually-significant associations.

Hi Sasha, regarding GREML, what do you think of Hill et al (2018, Molecular Psychiatry, 23, 2347)? Using GREML it reports a heritability for IQ of 50%. Would you accept this value? This estimate seems to be for only the additive effects of SNPs (am I interpreting that correctly?), so doesn’t include all other forms of genetic variation, so the true heritability would be larger (agreed?). Given that, it seems that the distance from the estimates from twin studies (~ 70%) might not be that big.

I wrote about the Hill et al. paper here (https://theinfinitesimal.substack.com/i/148251755/what-about-kinship-studies-why-do-we-need-to-control-for-relatedness). When genetic relatedness also correlates with environmental sharing (which we know it does for educational attainment and probably all behavioral traits) this approach estimates some undefined combination of genetic, environmental, and parental effects and the "heritability" estimate is uninterpretable. This is also shown in simulations in the Young et al. / RDR paper.

Hi Sasha, the Hill et al authors are aware of the issue, and spend time trying to disentangle these effects, and their claim is that they succeed. E.g.: “The SRMCouple represents the similarity between couples, which is mainly due to environmental influences, as well as the effects of assortative mating. However, the effect of couple environment and assortative mating are not confounded with the other matrices SRM’s nor with either of the GRM’s, …”.

And: “… the replication of the GREML-KIN findings with GREML-MS in the subsample of unrelated individuals provides further evidence that the heritability estimates are not majorly affected by residual confounding.”

I suspect that Hill et al would not agree with you that their results are “uninterpretable”, though please point me at a critique of the methods of this paper if you know of one. If we do take the Hill et al numbers at face value, then the “missing heritability” would be largely solved (which is what Hill et al claim), and it would have turned out that the twin-studies estimates were pretty much right all along.

The method mathematically cannot control for (a) indirect effects from parents; (b) continuous confounding between relatedness and shared environment; (c) population stratification -- all of which we know are at play for behavioral traits. This is confirmed in the simulations by Young et al. showing an equivalent "kinship FE" approach can be wildly inflated in the presence of confounding, whereas the Young et al. RDR method is not. As such, there is no reason to prefer to the Hill et al. estimator over the Young et al. estimator unless you are specifically interested in confounding; it's like saying we should use observational results instead of a randomized trial (in fact, you can think of RDR as essentially a randomized trial where mendelian segregation is the randomizer).

Here is Alex Young making the same critique (https://geneticvariance.wordpress.com/2018/08/13/relatedness-disequilibrium-regression-explained/):

"We found evidence that the Kinship method has greatly overstated the heritability of educational attainment, suggesting that a recent study employing a variant of the Kinship method may also have overstated the heritability of educational attainment (see Hill et al., 2018)."

And here (https://x.com/rubenarslan/status/1118231090181832705) Young is making the same argument on twitter, stating the method is "pretty useless for measuring heritability" to which one of the authors of Hill et al. says "If you are saying they can be inflated, I agree.".

Looks like an open and shut case.

This isn't a serious article. It ignores all of the discussion on the problems with twin studies that we've been having on substack, including the influence of indirect genetic effects, assortative mating, and rGE. It is also teeming with factual errors. As one example, Collier writes: "GWAS studies examine one type of genetic variability, Single Nucleotide Polymorphisms (or SNPs), and they typically record SNPs at 20,000 locations." Collier cites no support for this claim. Contemporary biobanks map out tens of millions of gene variants for use in GWAS. For example, the UK Biobank has mapped out 96 million variants. It’s a ludicrous misrepresentation to say that GWAS are based on 20,000 SNP locations. This is off by more than three orders of magnitude. It’s a terrible indictment of the quality of the editors at Aporia that they would publish an article that makes such an elementary mistake.

https://www.nature.com/articles/s41586-018-0579-z

Collier and seemingly the rest of the hereditarians don’t understand twin studies either. The most basic mistake they make is to think that twin studies give us an estimate of heritability that is independent of the particular population being studied. If you do a study of the heritability of IQ among Chinese twins reared apart, and you find a strong correlation, there's no way of knowing whether the correlation is due to shared genes or shared environment. After all, those twin pairs grew up in massively similar environments. They all grew up in China at exactly the same time and encountered similar health and education systems, similar entertainment options, similar political ideologies. Since all of those environmental features are shared by both twins, twin studies give you no way of measuring their effect on IQ. Twin studies can separate out the effects of environmental features that differ between twins reared apart. You can deduce something about the effect of parental income if those incomes differ or the effect of geography if it differs. But you can't do anything like estimate the full effect of environmental factors because most of them are shared by both twins. So any heritability estimate you get is valid only for a particular population, like Chinese people born in 1954, and has no validity outside of that population. If you don’t believe me, try reading an actual geneticist like K. Paige Harden in *The Genetic Lottery*.

As Lyman Stone argues there are always environmental influences that get lumped in with genetic factors in twin studies: 'This kind of GxE will almost never be captured by twin studies, because twins always share a birth cohort by definition. They are massively range-restricted in ways it is physically impossible to control for; the nature of being twins means you can’t have variance in things like cohort of birth or nation of origin, which are super important elements of “environment.”' That’s why when you test heritability estimates derived from twin studies, you find they are wrong. Stone provides multiple examples.

https://lymanstone.substack.com/p/more-evidence-twin-studies-are-bad?utm_source=publication-search

Hi Ian,

First, 20,000 SNPs is indeed "typical" for published GWAS studies (though this number is increasingly rapidly as technology advances, so at worst the number is out of date). And the number of SNPs used in a given study of a trait is not the same thing as the total number of SNPs mapped by UK Biobank. There are (I think) no estimates of the heritability of IQ that employ 96 million SNPs.

Second, every decent account of twin studies emphasizes that the heritability estimates pertain to the range of environments sampled in the study! This is known, understood and emphasized.

Obviously, if one changes the environment in ways that are not sampled in the study, then the heritability estimate could be changed. This is not in any way a refutation of the values given by twin studies (though it is a feature that does need to be understood and borne in mind).

You say "Such [twin] studies give high estimates for the genetic contribution to intelligence, such that the heritability is between 50 and 70%." This is not true. Twin studies can't even in principle tell you to what extent traits are due to genes and environment. That you didn't feel the need to address this point shows a total misunderstanding of twin studies.

As Sasha Gusev points out in his comment, there is already work by geneticists addressing precisely the problem that you raise in this article. Why didn't you bother to take that work into account when you wrote this article?

You compare SNPs to lines of code and ask how could you program a robot with 205 lines of code. The analogy between SNPs and lines of code seems off the wall and absurd. I've never heard any geneticist make that analogy.

This article is, in short, a huge mess, and it would not have been published by honest people.

As you are aware, I recently wrote an article on a popular theory about astrophysics, a field where I am way out of my depth. To deal with this problem, I hewed closely to the views expressed by actual credentialed astrophysicists and even contacted a couple of them to evaluate my views. When I have written about genetics, I have followed the same procedure, as I'm not a geneticist either. You might want to follow my example and get a sanity check from geneticists before trying to publish any more articles on this subject. https://open.substack.com/pub/eclecticinquiries/p/the-fine-tuning-argument-cant-get?r=4952v2&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false

Twin studies can indeed tell you to what extent traits are due to genes vs environment (for the range of environments being studied), that’s what they do best and they do it very well. (And if I didn’t expound more on twin studies in the piece it’s because it was mostly about GWAS studies; inevitably one can’t discuss everything.) For matters of practical importance, one is rarely comparing (say) children in 1920s China with 2020s Canada, one is usually comparing a child in one school in one town with a similar-age child in another town in the same culture/country. Twin studies and adoption studies are by far the best method for that, and, despite many trying to pick holes, the results seem robust and amply corroborated. As for comparing genes to lines of code, I think the perspective from information content is valuable. You haven’t actually said what’s wrong with that perspective.

If twin studies gave you a pure estimate of the effect of genes and environment, then that estimate would not change in a different environment. You're trying to square a circle here. The fact is that heritability estimates from twin studies reflect a mixture of genetic and environmental factors, which is what Lyman Stone has so effectively argued.

You’re not understanding the basics of “heritability”. **Necessarily**, *any* estimate of the relative effects of genes vs environment depends on the range of environments. This is not a limitation of twin studies, it’s intrinsic to the basic concept.

I should also have said: the fact that a value for “heritability” depends on the range of environments under consideration is not just a feature of twin studies, it is central to the whole concept of “heritability”.

IQ is a strange thing.

Apparently, Nigeria sends its brightest and best to America, yet the average IQ of people remaining in Nigeria never goes down as a result. How does that work? Because any measurement of IQ which shows African countries having a lower IQ is racist.

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts