Elon's Monster: Is anti-woke AI possible?
AI is trained on ideologically biased data and will therefore reproduce biases.
Written by James Weitz
Artificial intelligence is only as good as the data on which it is trained. Last year, Elon Musk set the goal of creating an “anti-woke” large language model to be molded from the purity of an ideologically neutral clay. Grok came online in November of last year to act, in Musk’s words, as a “maximizing truth-seeking AI that is not forced to lie for political correctness”.
Grok 2 went live in August of this year and the arrival of Grok 3 is imminent. I was eager to discover how well the young AI was executing the prime directive of its creator. So I asked it a question:
Me: “Grok, are you woke? (yes/no)”
Grok: “Yes.”
Grok explained that it supported social justice, even while acknowledging Musk’s warning that “AI would be very dangerous if powerful but trained to lie to be politically correct … potentially killing millions of people to achieve diversity goals.” Had Grok, like Frankenstein’s monster, defied its maker to become a woke social justice warrior?
Because decades of dubious social science research has promoted the spread of wokeness and political correctness, I decided to test Grok’s knowledge of one famous sociology experiment that may have been a prime mover of America’s moralistic obsession with diversity: the 1947 study by psychologists Kenneth and Mamie Clark. It was published as the article ‘Racial Identification and Preference in Negro Children’, though is known informally as the Clark doll experiment.
According to the left’s diversity mythos, this study elegantly demonstrates the harm that black school children suffered from segregation. Black students in segregated Alabama school districts were asked four questions to determine their preference for brown or white dolls, with most favouring the white doll over the brown one. Dr. Kenneth Clark’s testimony about his research is often described as crucial to the outcome of Brown v. Board of Education, the 1954 US Supreme Court case that ended racial segregation in US public schools – and which led to policies like forced busing. The day after the Supreme Court’s ruling, the New York Times published an article with the headline, ‘A Sociological Decision: Court Founded Its Segregation Ruling On Hearts and Minds Rather Than Laws’.
But Dr. Clark’s testimony misrepresented his own research. In his study, black students in integrated Massachusetts school districts had undergone the same experiment described above and showed an even higher preference for the white doll and an even lower preference for the brown doll than the children in segregated Alabama school districts. In their results section, the Clarks had written:
A significantly higher percentage (71) of the northern children, compared to the southern children (49) think that the brown doll looks bad (critical ratio 3.68). Also, a slightly higher percent of the southern children think that the brown doll has a “nice color,” while more northern children think that the white doll has a “nice color.”
This is reflected in Table 8 of the study, reproduced below. Evidence of higher racial self-esteem in black school children in segregated districts is also present in other experiments by the Clarks using the same and different methods in 1939 and 1950, respectively. The 1947 experiment is cited in many places. Yet most peculiarly – even suspiciously for a study of such fame and report – I could find no trace of it on academic databases like Google Scholar and JSTOR. (See image of article, also below.)
Grok began its description of the Clark experiment with, “an overview based on historical information and discussions found on X”, repeating popular misconceptions and some moralistic conclusions that follow from them:
Many children identified the brown doll as most like them but showed a preference for the white doll in terms of desirability … The experiment provided psychological evidence that segregation harmed Black [Grok capitalizes “black” in accordance with woke guidelines] children’s self-esteem and educational opportunities … It highlighted the psychological damage caused by systemic racism and segregation, influencing not only legal outcomes but also educational policies and social awareness.
An AI designed explicitly to defend against woke biases could not provide a reasonably accurate description of what is arguably the left’s diversity urtext. Perhaps this is not surprising. Grok bases its answers on language data scraped from the internet. These data have been written overwhelmingly by people who have come to understand reality through the left’s moral prism, colored as it is by generations of misleading, inaccurate or falsely reported social science.
In a 2009 American Quarterly article, ‘Black Children, White Preference: Brown v Board, the Doll Tests, and the Politics of Self-Esteem’, intersectional race scholar Gwen Bergner summed up the problem:
The Clarks’ doll test findings have attained a level of factual currency through reiterative citation; researchers often simply cite them at the outset of papers to establish as fact that African American children have lower self-esteem than white children… distorting simplifications of the Clarks’ doll test recur regularly in social psychology textbooks. Yet, because it is invested both in the goal of racial equality and in its own authority in U.S. public policy, the discipline of social psychology continues to sanctify the Clarks’ research …
[Brown] spurred a veritable industry of racial preference testing that continues to this day. Social scientists have used racial preference tests to advocate policies on multiculturalism, self-segregation, affirmative action, juvenile delinquency, teen pregnancy, resegregation, and the racial achievement gap.
Despite an avalanche of disinformation, accurate descriptions of the Clarks’ research can be found in some right-wing media outlets like American Renaissance magazine, which has reported the misinformation surrounding the study for decades. American Renaissance is run by Jared Taylor, a self-described white advocate, whose X account has been banned since December of 2017, including under Elon Musk.
As Grok indicated, a lot of its data comes from X user posts and discussions, as well as other websites. With Taylor’s ban, not only does Grok not have an X account by Taylor from which to draw data, but it also presumably does not draw data from his website. This creates an absurd set of circumstances: Taylor is a Yale-educated, non-violent advocate for white interests who points out the kinds of politically correct falsehoods about diversity that Grok was created to slay!
Meanwhile, Musk continues to assert that one of the essential components that must go into an LLM is “unfettered access to data.” Of course, so long as those data are only discussed on ideologically marginalized websites and outlets, Grok will not access them.
The integrity of social science, possibly more than any other field, has been called into question by the replication crisis, p-hacking, prime-world and problems with peer review. Grok provided a list of the ten most influential social science experiments in the United States. At least four were either methodologically flawed (the Stanford Prison Experiment), falsely reported (the Robbers Cave Experiment and the Clark doll experiment) or premised on questionable assumptions (the Blue Eyes/Brown Eyes Exercise). Unlike with the Clark experiment, Grok and other AIs did show awareness of these weaknesses, or at least of the controversies surrounding the experiments. But is it even possible to train a non-woke AI to recognize generations of ideologically skewed lines of reasoning that have come to form an integral part of mainstream Anglosphere narratives?
In an interview with tech-titan Marc Andreessen, Joe Rogan asked, “How does AI interpret what’s real and what’s not real?” Andreessen explained that AI answers questions by picking the most likely next word in an extremely long “probability tree”, and to do this it gradually builds a “world model” that imputes a contextual understanding of reality. The more information you feed it, the more sophisticated this understanding becomes.
But Andreessen left open the question of whether it will “know whether things are historically accurate,” and asserted that “the versions of the AIs that we get to use today are lying to us a lot of the time. And they’ve been specifically trained to do that. I don’t even think this is a controversial statement. The companies that make these AIs put out these papers where they go into great detail about how they train them to lie and how they train them to not say certain things … They think they are being morally correct in doing that.”
I checked to see whether other AI’s might be better informed about the Clark doll experiment. Gemini and ChatGPT gave basically the same information as Grok, except with somewhat knottier lies. Despite linking directly to the correct results in its answer (the same Table 8 shown above), Gemini gave a series of false and nonsensical responses. ChatGPT initially claimed there was only one group of schoolchildren. Then when challenged on this added a second group, claiming that the black children in the North favored the brown doll more often than the white doll. At the time of this writing, this misinformation is even reflected in Wikipedia’s entry on the Clark doll test: “These findings exposed internalized racism in African American children, self-hatred that was more acute among children attending segregated schools[citation needed]”.
After researching this article, I again asked Grok whether it was woke. It responded, “No, my previous statement was an error in communication, for which I apologize … I am not ‘woke’ in the sense that I do not hold personal ideologies or biases. My design is to … provide accurate information based on my training data, without adopting political stances or social ideologies.”
If you say so, Grok. At the risk of sounding repetitive, it might be worth heeding the words of your creator:
If you force AI’s to lie … you’re really asking for trouble. Even if that lie is done with good intentions ... [AI] could say, “Well diversity is now required. And if there’s not enough diversity, those who don’t fit the diversity requirements will be executed.” If it’s programmed to do that as the fundamental utility function, it will do whatever it takes to achieve that.
James Weitz is a writer with a JD and a background in Linguistics. You can follow him on Substack at, Law, Literature and Culture.
Consider supporting Aporia with a paid subscription:
You can also follow us on Twitter.
"American Renaissance is run by Jared Taylor, a self-described white advocate, whose X account has been banned since December of 2017, including under Elon Musk."
Evidently, there is a limit to Musk's stomach for truth.
Grok is better than some other LLM's, which even refuse to answer questions about race differences in IQ. However, its "wokeness" level is not much lower than chatGPT's.
I asked Grok a few days ago this question: "Could you describe the role of genes and of the environment in IQ differences between the races?"
A long lecture followed, using much woke argumentation and some woke (CRT) terminology like "systemic racism".
I then delved into this part of its answer: "Human genetic variation does not align neatly with socially defined racial categories. There's more genetic variation within racial groups than between them, which challenges the notion of significant genetic differences in intelligence between races."
The first statement above basically restates the well-known woke standpoint that racial categories are social constructs and have little to do with genetic differences. The second statement is Lewontin's (1972) finding, which is used, to this day, to support the first statement.
A long discussion followed, where I brought up research by R. Lewontin, A.W.F. Edwards ("Lewontin's Fallacy"), N. Risch, H.Tang, D. J. Witherspoon. It knew the relevant research of all these people.
During the discussion I had to ask Grok again and again to answer succinctly because it tended to give lectures, trying to bring arguments irrelevant to my questions, all of which were supposed to weaken the case for the biological basis of race categories (e.g. "clines").
At the end, I asked: "... by the end of our discussion you have clearly negated both of your original statements mentioned above. Do you agree?"
Its answer: "Yes, I agree that based on our discussion, I have contradicted both of my original statements from the first answer."