Here's how AI will end the culture war...
Completing the Turing Test would represent a more or less total victory for Turing in his dispute with Wittgenstein over machine intelligence.
Written by Cassian Young.
It is likely that a machine will successfully pass the Turing Test in the next five years. Though widely anticipated, the impact of this event remains disputed. Some view it as a giant milestone for humanity, akin to the discovery of fire, while others see it as the mundane conclusion of a half-baked thought experiment.
Turing was provoked into developing his test by the culture war. We might never have heard of the Turing Test if his beloved mathematics had not been strafed by its stray bullets. The test still has little meaning outside that context. This is why most commentators can agree that the test holds great significance, but few can agree what that significance is.
***
If the culture war has a beginning, it is probably 1st October 1880, the day that the evolutionary biologist, T. H. Huxley, attacked the study of literature in a speech at the opening of the University of Birmingham.
Huxley launched his attack with the goal of elevating science. At that time, it had a lower status than the study of Greek and Roman literature. Students of literature were seen as cultivated, while students of the sciences were barred entry to the "cultured caste."
The humanities monopolized the "higher problems". Anyone interested in exploring what separates us from animals or machines would head to the philosophy and literary sections of the library before those stocking science.
Huxley believed that the status of science could only be boosted if he could break literature’s longstanding monopoly on the great questions. Science would anyway offer better answers, as its solutions were grounded in empirical evidence rather than the analysis of texts.
But Huxley’s attempts at inverting the status of academic disciplines provoked a fierce response that is still playing out today. A counter-revolutionary movement emerged to protect the status quo. Its attacks on the status of science, logic and mathematics pulled Turing into the conflict.
Where Huxley claimed that facts could account for culture, the new movement believed that facts were derived from culture. This question – What comes first, facts or culture? – became a new front in the culture war and it remains the site of its most bitter fighting today.
More critical for Turing, this movement downgraded mathematics. Until then, mathematicians had been seen as discoverers, hunting out facts from an independent realm of number. The new movement sought to demote them to the lesser status of inventors or toolmakers.
Turing's unique position in mid-20th century intellectual life allowed him to develop a remarkable, long-term strategy for preserving the status of mathematics. As one of few people capable of imagining the development of the digital computer, he could envision a world in which technology ended the chicken and egg arguments about facts and culture.
Turing believed that technology could solve the deep questions. The test was a public wager that it would be science and engineering, and not philosophy, which would untangle the relationship between language, meaning, and human identity.
This was (and is) a bold claim. If proven correct – as now seems likely – the implications are hard to overstate. Questions that have been the exclusive property of philosophy for millennia would at last become empirically testable.
Should a machine pass the Turing Test, the borders between science and the humanities would be redrawn. That event would demonstrate that philosophical questions can be solved empirically, realizing Huxley's dream of science annexing the 'higher problems.'
***
The classic Turing Test assesses a machine's ability to simulate human capabilities. A human judge is asked to converse with a machine and a human, without knowing which is which. The machine passes the test if the judge cannot distinguish the two based on their responses.
This paradigmatic test differs in several respects from the test Turing described in his 1950 paper. That begins with a judge being asked to guess the sex of the two other players, but Turing loses track of that idea a few paragraphs in and segues into a more familiar version of the test.
But Turing was quite clear about what would constitute success. Once a machine can "do well" in his "Imitation Game," with an interrogator having only a 70% chance of making the correct identification after five minutes of questioning, we would have resolved whether machines can think. Turing anticipated that this would happen around the year 2000.
Unfortunately, much of the analysis of Turing's paper is confusing. It centers on improving or updating it while failing to explore the implications of Turing's writing, as it is.
Commentaries typically begin by asking if the test has already been completed. They then narrate the stories of two primitive chatbots, ELIZA, from 1964, and Eugene Goostman, a simulation of a Ukrainian boy created in 2014.
The fact that these primitive constructs succeeded in gulling anyone is taken as evidence of the test's weakness. This leads to proposals for additional tests of creativity, storytelling, or theory of mind, before becoming mired in the insoluble problem of defining human intelligence.
All of this misses the most crucial aspect of the Turing Test: that it is, in fact, a test. This may be because Turing's paper leaves key factors unspecified: for instance, the judge's level of expertise. Are they a non-specialist taken from the street, or a researcher equipped with technical questions drawn from the latest AI research?
It is doubtful that Turing intended the latter. If he had, he could have specified that to be successful, specialists must assess the machine as indistinguishable from a human.
Why would he have bothered with the clumsy machinery of a judge guessing an interlocutor's sex over 5 minutes with a 70% success rate? In other words, why propose a concrete test at all?
We can only understand Turing's intentions by dropping his paper back into the ambiance of the mid-20th century university. Its meaning only becomes apparent in the context of the academic disputes of that era.
To simplify them, these pivoted on the role of reductionism - the belief that complex phenomena can be reduced to simpler components - and holism - the idea that reductionism must fail.
One school maintained that the meaning of words, sentences, and mathematical formulas could be reduced to logical rules and facts. The other argued that meaning is intrinsically bound to social practices. It followed that meaning would never be captured in fixed logical rules nor mastered by machines.
"Team Reduction" was led by philosophers like Russell, Carnap, and Frege. "Team Holism" drew on talent from the Continental and Anglo-Saxon traditions, including Heidegger, Merleau-Ponty, Gilbert Ryle, and the later French post-modernists. One man captained both teams at different points in his life - the Austrian philosopher Ludwig Wittgenstein.
Wittgenstein began his career by attempting to reduce all meaning to logical propositions. When that project failed, Wittgenstein adopted two positions that were the polar opposite of his early work. He now claimed it was impossible to reduce meaning to rules, embracing the "Meaning is Use" motto. He also claimed that meaning is inextricably linked to human practice.
These ideas now go by the name of "linguistic holism." With an added dash of Marxian spice, they dominate large parts of the academy, returning as the idea that meaning is not merely linked to social practices but generated from social power relations.
Turing is usually seen as a mathematician or proto-computer scientist, but he was also deeply involved in philosophy. The paper that describes the Turing Test was published in Mind, a major philosophy journal. Wittgenstein's 1939 lectures on the foundations of Mathematics were a double act, with Turing playing the mathematical straight man to Wittgenstein's philosophical provocateur.
Turing rejected Wittgenstein's claim that the meaning of a mathematical statement is its use. He also refuted the holist view that meaning is a uniquely human phenomenon. He instead proposed that human thought processes could be reduced to logical and computational operations.
If Turing was one of very few people in mid-century Britain who could envision the development of computer technology, he was probably the only one who could see the threat it posed to holism. Turing's close involvement in developing digital computers helped him to imagine a day when computers would use language the same way as humans. When this happened, holists would face two problems. If they continued to see meaning as use, the claim that meaning is a distinctively human phenomenon would be in trouble.
Humans may have used language before computers, but monkeys used tools before humans, and we do not consider tools to be "distinctively monkey".
Second, if the machines mastering language were digital computers, the holist claim that meaning cannot be captured in fixed rules would also be shaken. Turing pointed out that digital computers can only follow rules in a deterministic manner.
For Turing's attack on Wittgenstein to succeed, all that was required was for a machine to pass a test of ordinary language use. After all, Wittgenstein claimed that meaning was use, not use plus theory of mind, plus creativity, plus a series of post hoc conditions. It is this context that allowed Turing to specify a simple, practical test of five minutes duration.
Even if we were to allow Turing's critics to smuggle in additional tests of competence, holism would still be in deep trouble. Large Language Models (LLMs) have recently (and unexpectedly) begun to exhibit emergent skills they were not trained in, like translating between languages, playing chess, and doing arithmetic.
These skills only appeared when the number of trainable variables in the models (their "parameters") reached a certain threshold, but they have continued to grow in power as the models have expanded. More emergent abilities are expected to appear as the models develop.
LLMs have acquired common sense. When asked how to stack a book, nine eggs, a laptop, a bottle, and a nail, GPT-4 proposes that the nine eggs be arranged in a three by 3 square on top of the book and the laptop balanced on top.
LLMs also are gaining theory of mind, with GPT 3.5 performing at the level of a seven year old child and GPT-4 scoring 95% on a standard test. The models already outperform many humans who lack this ability, including children, people with autism, and the senile.
Evidence also shows that models have developed a "theory of world" skillset. They can respond to narratives of the form, "I put a coin in a cup and went to the dining room. I inverted the cup and returned to the kitchen", concluding that the coin is in the dining room and explaining why.
Remember that we are still at the very beginning of AI development. Models will grow further and be optimized. Researchers are exploring new approaches to AI, and the resulting systems will run on far more capable hardware. It is unwise to bet against any enhanced version of the Turing Test being completed within five years.
If and when this occurs, it is not only Wittgenstein's position that would be at risk, but entire philosophical traditions. Heidegger would have questions to answer, for instance. How does his claim that language is tied to "existential embeddedness" and "Being-in-the-World" stand up to computers that engage in conversation without any "existential involvement" of their own?
Similarly, how do Merleau-Ponty's claims about the importance of embodied experience to human cognition pan out in a world where computers simulate human understanding without having a physical body of their own to experience the world?
The Derridean edifice also looks shaky. It is founded upon the claim that meaning is never fixed or stable but depends on shifting interpretation and context. How can this stand up to machines that use language as we do, even though they are entirely constructed out of fixed and stable rules?
It's not so much that these traditions are at an end; they may still have valuable insights to offer, but from now on, they will also have to express them through the framework of science.
The advent of computer models exhibiting human-like capabilities renders many of Huxley's abstract 'higher problems' into testable claims about what it is it that generates those similarities.
Completing the Turing Test would represent a more or less total victory for Turing in his dispute with Wittgenstein over "machine intelligence". Wittgenstein claimed the phrase was oxymoronic, arguing that asking whether a machine can think was like asking if the number 3 has a color.
He saw the question "Can machines think?" as conceptual rather than empirical, contrasting it with the question, "Can a machine liquefy a gas?". Turing countered that the question was not only empirical, it would be resolved within decades.
In 2023, Turing's wager looks like the better bet. An examination of the history of AI research since 1950 shows the question "Can machines think?" is being resolved via a similar process to that which helped Michael Faraday to liquefy gases for the first time.
Faraday's achievement relied on a mix of practical, theoretical, and conceptual advances. For instance, it was only viable to construct the device Faraday used to liquefy Chlorine with practical advances in glassmaking.
It was only possible to understand phase transition with theoretical advances in the kinetic theory of gases. Success also depended upon the conceptual insight that a substance's liquid and gaseous forms are two phases of one substance and not two different ones.
AI research has progressed through a similar combination of practical, theoretical and conceptual advances. It started using an approach with striking parallels with the one pursued by the young Wittgenstein. Researchers tried to encode human reasoning as rules and facts, the approach called classical or symbolic AI.
However, that approach failed for the same reasons Wittgenstein had failed decades before. It was easy to encode maths and logic in rules, but other concepts were too ambiguous or vague.
Wittgenstein discovered it was straightforward to delimit precise terms like 'number'. However, everyday words such as 'game' proved more difficult to define. He could find no common denominator between the various phenomena we call "game."
Chess, football, and boxing have no common core. Instead, "game" seems to identify "a family of concepts with overlapping similarities" clustered around a family resemblance.
Words like "beauty" and "love" presented other problems. They are used in varied contexts, like discussing art, aesthetics, or personal preference.
Like Wittgenstein before them, the AI researchers struggled to formulate rules that accounted for ambiguous patterns and nebulous concepts. But unlike Wittgenstein, they did not give up. The field explored alternative approaches until "deep learning" experienced a burst of success, turning their focus towards it.
The new approach required intensive processing. Progress depended upon advances in computer technology, including developing specialist "GPUs" that were not unlike the practical advances in glassmaking and building mechanical pumps that made Faraday's experiment possible.
Their advances also relied on theoretical progress, like the development of transformers and the data structures called "embeddings."
"Word embeddings" locate words and phrases inside a multi-dimensional network, with the position of each word decided by analyzing large volumes of text. Words with similar meanings are placed closer to each other. Unrelated words are made more distant. For instance, "dog" is closer to "puppy" than to "apple."
If embeddings had only one or two dimensions, researchers would encounter the same problem that defeated Wittgenstein: the challenge of identifying one analytical framework that encompasses the multifaceted nature of language. But embeddings are not limited to just one dimension; they can have thousands. This allows for the word "dog" to be situated near words with other forms of connection like "cat" or "kennel" on other dimensions.
It also allows embeddings to store a multiplicity of semantic relationships like synonymy, antonymy, and analogy that can capture the multivalent character of words like "game" or "beauty." These connections also allow AI models to make the rich conceptual connections required to encode the nuances of a poem. While embeddings have limitations in their current form and more sophisticated approaches are expected to emerge, it is already clear that AI researchers are making progress where Wittgenstein failed. LLMs have no problems using words like "beauty" or "game," including in novel circumstances that test the borders of the concept.
Seen from the perspective of the US neuroscientist and computer scientist, Pentti Kanerva, this research is not so much negating Wittgenstein's later work, as building upon it.
Kanerva is best known for developing the idea of Sparse Distributed Memory, a memory model that draws inspiration from the human brain's structure and function. He points out that the most profound questions in neuroscience and AI research – e.g., "What is consciousness?" and "How does the human mind arise from the matter we are made of?" - are philosophical.
Kanerva believes these questions will be answered by enhancing AI researchers' data structures to model the mind. Once those structures advance to the point that they capture the mind's key characteristics, we will have solved the deep problems.
Thus, if the younger Wittgenstein failed to capture meaning with the thin set of constructs he had at his disposal ("logical atoms" and "truth functions"), his later musings on more complex relationships like "family resemblance" showed the way forward.
By taking this idea further and exploiting far richer data structures like word embedding, contemporary engineers have achieved what Wittgenstein thought impossible and captured words like "game" and "beauty" in fixed rules. Kanerva predicts that we will go on to develop ever more advanced data structures that will help us fully simulate the mind.
If this comes to pass – and we now have reason to believe that it will – Turing will be proven right and Wittgenstein wrong: the question "Can a machine think?" would be shown to be partly empirical. The highest of Huxley's "high problems" would have been answered by science.
One way or another, AI's successes are shunting many old questions about human nature away from handwaving in the humanities towards testable claims in the sciences. This transition is hardly unprecedented. Physics, biology, economics, sociology, and political science were all once part of philosophy. Each was emitted from the discipline once the conceptual problems that blocked their progress were resolved.
It wasn't until the 19th century that psychology emerged as a separate field, with figures like Wilhelm Wundt and William James establishing experimental methods to study the mind. Before this, matters of mind and behavior had been considered questions for philosophers.
If this pattern holds, we may see the genesis of a new scientific field that illuminates the connections between language, intelligence, and human nature.
Although any solution to the great questions will probably require collaboration with philosophers, Huxley and Turing's dream of science and engineering driving their resolution looks as if it might be realized at last.
Cassian Young develops and implements automation strategies. He has worked as an adviser to the UK government on the design of public services. He writes about philosophy, work and technology.
Follow us on Twitter and consider supporting Aporia with a $6.99 monthly subscription:
Excellent article (and the Huxley is good too, so thanks for that as well).
I have no idea whether this point is really five years away or will always be five years away. I hope it comes because I want to see it.
I'm also glad to learn that Wittgenstein has not been entirely refuted but rather built upon. It seems like Wittgenstein's broadest ideas will still remain after this point arrives. It might not be understandable to us mere humans, but it will still exist for the machines (is my uneducated guess).
If a painting of an imaginary apple is done so accurately and competently that an observer can't tell it's not a photo of a real apple, does that mean that the painting is indeed of a real apple? When the Mechanical Turk fooled observers into thinking it was a machine rather than a human, does that mean that it really was only a machine and not human? If a trans-sexual uses make-up and clothing and practiced behaviours to successfully fool bystanders into thinking he is a woman, does that mean he really is a woman?
This is what is being claimed when people argue that an artificial simulation of thinking must be the result of real thinking just because the simulation is good enough to fool people. The success of a deception does not alter the subvenient reality.