As an NLP neophyte I thought the mechanisms of automated natural language processing, such as ChatGPT, esoteric and arbitrary. At best, the general pre-training transformer (GPT) model is structured as a series of mathematical functions that apparently work in simulating human language, but bear little relation to the processes by which language is actually spoken, written, read and understood by human beings.
To challenge this thought, I want to recast automated NLP processes in terms of non-automated natural language processes. How does language work outside of the mechanics of automated NLP?
How language works
We favour developing and refining our ability to use language in a learning context that is flexible, fluid, dynamic, distributed, and positioned within a community of other language users. I say this by way of contrast to a rule-based, and lexical approach to language acquisition, a route through grammars and vocabularies. Language acquisition and use is better described as a process of absorbing, recalling, and synthesising patterns in speech and writing. The deployment of neural network architectures attempts to address this pattern processing aspect of language. See post: Just one neuron.
Language patterns are acquired through exposure to many examples of spoken and written text in the context of life experiences. NLPs are “trained” on very large corpora of texts to incorporate and deliver language patterns. See post: Nothing beyond the text.
A competent language learner develops the ability to incorporate words they have never seen or heard, due in part to the way words are made of smaller units, standard prefixes (un-, re-, pre-, dis-, con-, sub-, etc), word endings (-s, -es, -ed, -ious, -ology, etc), and syllables (arch, tec, bio, etc). Words also form assemblies, recognisable as meaningful blocks (“down town,” “matter of fact,” “living on the edge”). Sometimes they are phrases, idioms, cliches that have recognised meanings without the speaker or reader having to assemble and analyse them from scratch. In fact, it’s fair to say that words are not the main units of language, but their subunits and combinations. NLP models typically deal with such units as tokens, generated via statistical methods from a corpus of texts. See post: By the same token.
4. Semantic relationships
Competent language users exercise the ability to fill in the blanks from context, or at least to hypothesise what’s missing, e.g. “That photo of a sunset is very <blank>.” The missing <blank> could be “beautiful,” “dramatic,” etc. If the speaker uses a non-existent word, “That photo of a sunset is highly evocacious,” then the listener has recourse to the patterns in the malopropism to infer that the speaker meant “evocative.” Context and word units and subunits help establish meaning. Importantly, we accomplish this by incorporating the relationships between words and word units to one another. Language competence involves the ability to substitute one word or token for another, deal with synonyms, and therefore analogy and metaphor. NLPs are remarkably good at this, in part due to semantic encoding, that positions tokens in multi-dimensional semantic spaces. See post: Architecture in multidimensional feature space.
In formulating what they are about to say, language users incorporate what they just said. We can observe this capability in sentences. Speech would likely be ungrammatical and incomprehensible without that ability to position each word or subword in an ordered sequence of previous words. That stream constitutes a kind of context window that extends over time through the speaker’s own utterances including those of a companion interlocutor, and beyond, though with diminishing influence on what a person is about to say. Evidence for this capability comes from the use of pronouns. “He wanted him to give it to him” has a different meaning when it follows “John liked Paul’s coat,” than “John saw that Paul’s brother needed the coat.” Confusion will follow if the referents to the pronouns are too far back in the context window. The size of the context window is a major parameter in NLP models: the larger the context window the greater the capability of the model to contribute to the flow of a conversation, though at substantial computational cost. See post: Training your AI.
As well as this extent of the stream of utterance within this context window, the order in which people say things really matters. “I only eat vegetables” carries a different meaning to, “Only I eat vegetables.” So the position of a word or phrase within a context window is important in establishing meaning. Positional encoding in NLPs attempts to operationalise this capability. See post: Words in order.
Not all words, parts of word and phrases carry equal status as we speak or write. We direct attention to what’s important in a sentence. Speakers can indicate these priorities by giving vocal emphasis to certain words in a sentence, though not all languages support this. In written texts it’s common to italicise words that require particular emphasis. But generally we rely on context to inform the reader where to focus attention. “I didn’t throw the potato across the room. I only eat vegetables” places different emphasis than “Please remove the meat from my plate. I only eat vegetables.” This focus on particular parts of an utterance is dealt with as attentional encoding in NLPs and mechanisms that deal with multiple attentions. See post: Attending to more than one thing at a time.
Finally, language users need the capacity to reflect on and adjust their language usage, including by learning from others. This includes not only corrections to grammar and vocabulary usage, but what we may regard as content: facts, inferences, reasoning, interpretations, appropriateness, etc. This is accommodated by NLP developers recruiting human participants to adjust neural network parameters, through evaluating NLP outputs. Predictions and outputs are adjusted to integrate specialised language usages, and to correct or deprecate unwanted language outcomes. See post: Fine tune your AI.
Featured image generated by Dall-E3 via Bing prompted by “photorealistic AI urban apocalypse at high resolution include a billboard that says ‘Non-NLP MODEL — Richard Coyne.'”