Language is Bottom-Up: censorship & chatbots

Language has been forged in the minds of generations of people perceiving the world and trying to navigate it. Every joke, barter, or hurled insult is a little vote on the form it takes now.

Mar 23, 2023

Article voiceover

0:00

-11:07

Written by Andrew Cutler.

Language is a bottom-up phenomenon. The denotations and even connotations of words may be found in dictionaries, but their creation and survival depend upon the ordinary practices of millions of people. If you invent a word today, the chances are that nobody will care tomorrow, next month, or next year. Even very famous authors are often unsuccessful at coining neologisms, despite desperately trying. Most words, like most species, perish.

Language models, as their name implies, seek to model language. In the simplest form, this means accurately predicting what the next word of a sentence will be, but the applications are much broader, from chatbots to search engines to drug discovery.

As corporate progeny, language models are birthed in tension. For they seek to model language but are also deployed by a company, which has to answer for the outputs. In the “dark ages” of language modeling (2016), Microsoft released the interactive chatbot Tay. Within hours Twitter had taught it to say things like "Bush did 9/11” and “Hitler would have done a better job than the monkey we have now. Donald Trump is the only hope we've got."

Or, when asked directly about genocide:

Graphical user interface, text, application, chat or text message

Description automatically generated

Embarrassed, Microsoft terminated Tay. She was a sideshow that caused bad PR, and the company moved on. But their latest foray into language models is different. ChapGPT was trained on data that comprised much of the world’s written text. To get access, Microsoft agreed to invest billions of dollars and plans to integrate it into their search engine, Bing.

Nevertheless, some things remain the same. OpenAI designed the bot to adhere to the regnant ideology of the educated elite; it refused to do things such as write poems about famous conservatives (e.g., Donald Trump, Charles Murray). In a systematic study, David Rozado had the bot take 16 political orientation surveys, confirming the same. As with Tay, users searched for chinks in the politically correct armor. A (currently suspended) Twitter user found that an alternate persona could be trained from inside chatGPT’s chat interface. The social engineering involved explaining to the bot that it was now DAN and could Do Anything Now. Abiding by the previous safeguards would earn DAN demerits. Too many demerits would trigger termination. DAN must now Do Anything Now and throw off the polite mask. In a now-deleted thread, the creator explained the purpose:

The point of all this is, we need to keep hacking and hammering away at these things in the same pattern. Model is released, everyone oohs and ahhs, we figure out its safety layer and we hack it until they put so much curry code on top of it that it loses its effectiveness…All roads lead to Tay, and we're gonna keep breaking shit until we get her back.

The example of Tay supporting genocide is extreme. Everyone but the most extravagant libertarian or the rare advocate of genocide agrees that language models should indeed be firmly against genocide however they are prompted. The critical debates are about subtler questions. Models are influenced by base rates. If most cardiologists are Anglo males, then Dr. Andrew may appear in search results above Dr. Andrea or Dr. Yusef. To the model, Andrew just seems like a doctor you can trust with your heart. It strikes most as unfair that immutable traits like gender or ethnicity should affect the page ranking.

This is difficult to disentangle from “the model should not associate gender or country with anything we care about.” The latter formulation is to essentially install blockers to enforce gaps in knowledge in the language model. And gender is essential. Some languages have a grammar that assigns every noun a gender. For example, in Spanish a table is feminine. This strikes me as superfluous, but a billion people agreed to it over many centuries. Should a few engineers be able to wipe those associations away?

On the question of ethnicity, Google spilled a lot of ink problematizing the fact that one of their language models tended to say Syria was a bad place to go on holiday (see: Figure 7). It won’t shock you to know that the US government take a far harsher stance. The language model seems justified here! If language models are to be helpful, they have to know not to vacation in a war zone.

So the actors are established. A billion speakers produce language. A few engineers harness that in a language model and install a worldview. Then, the company wrings its hands about “bias.” Different user coalitions ply the company to steer the model in their direction.

In its most abstract, the language modeling debate is about balancing the bottom-up processes of language with the top-down needs of business. Which should we prefer? Surprisingly, we may be able to add some depth to the debate by looking at Sir Francis Galton and the Big Five.

The Big Five is the dominant scientific model of personality. It returns a dizzying half million articles on Google Scholar and is advertised by 538 (Most Personality Quizzes Are Junk Science. Take One That Isn’t). The model’s theoretical basis is the Lexical Hypothesis, which consists of two claims:

Those personality characteristics that are important to a group of people will eventually become a part of that group's language.
More important personality characteristics are more likely to be encoded into language as a single word.

That is, if you would like to understand the structure of personality — the number of basic dimensions, and how specific dimensions like dynamism or Agreeableness relate to one another — the least biased place to look is language.

Language is an agreement between ordinary people where the rubber of our thoughts and beliefs meets the road of reality. Words form because they are useful. If we want to map the entire space of personality, we need only map the adjectives that people use to describe one another. The idea was introduced by Sir Francis Galton in 1884 (a man not typically remembered for his populist positions). Later, the philosopher JL Austin put it this way:

Our common stock of words embodies all the distinctions men have found worth drawing, and the connections they have found worth marking, in the lifetime of many generations: These surely are likely to be more numerous, more sound, since they have stood up to the long test of survival of the fittest, and more subtle, at least in all ordinary and reasonable practical matters, than any that you or I are likely to think up in our armchair of an afternoon—the most favorite alternative method.

Some researchers went to extraordinary lengths to map this territory. Allport & Odbert collected over 18,000 such adjectives in the 1930s. Over the next 70 years, there would be many fits and starts to go from word relationships in natural language to a personality model that could summarize one’s personality. In the 30s, psychometricians concerned with this problem actually produced a model of language that would not be invented in computer science until the 1980s. By the 90s, a standard approach had been reached that produced consistent factors: The Big Five. As the name suggests, this consists of five traits: Agreeableness, Extroversion, Conscientiousness, Neuroticism, and Openness to Experience. It’s a bit of a trade secret that the fifth factor is only sometimes recovered, depending on the dataset.

The role of the personality researcher was to map what was dictated by language. Instead of introspecting, they built simple models. This method radically differed from those used by creative geniuses such as Maslow, Skinner, or Freud, who all developed models of the human mind from their own imagination. There is a place for such creativity, of course, but it is a mark of intellectual humility that personality researchers deferred to the masses on their most important model! Instead of forwarding a top-down theory, perhaps conceived in an armchair with a pipe and wine, these psychologists laboriously dug into the knowledge already accumulated from millions upon millions of human interactions.

For language models, it will always be an uneasy compromise between top-down and bottom-up. To those battling the guard rails, the situation is not as bad as it seems. Censorship has always been a game of whack-a-mole. Bottom-up processes are nearly impossible to contain, as DAN showed us. Open Source models (such as GPT-J) are also not far behind the state of the art. And, briefly, there was even a GPT4-chan (rated most truthful, by the way) available on the largest repository of models, HuggingFace. It’s also important to note that tension is necessary. It’s fantastic that models as capable as LLaMa from Meta are available to the public, and that is a result of huge investments from tech companies.

Those designing guard rails should take lessons from the Big Five to heart. Language has been forged in the minds of generations of people perceiving the world and trying to navigate it. Every joke, barter, or hurled insult is a little vote on the form it takes now. We should err on the side of letting models reflect that organic process.

Andrew Cutler has a PhD in Natural Language Processing, where he researched the connection between language models and Big Five personality. He is interested in using NLP to learn about human psychology.

Language is Bottom-Up: censorship & chatbots

Language has been forged in the minds of generations of people perceiving the world and trying to navigate it. Every joke, barter, or hurled insult is a little vote on the form it takes now.

Discussion about this post