I’m going to describe how how autoregressive language models function within the framework of an interactive mystery game. This will give you a perspective on what a large language model can do to enhance the experience of playing a game.
Imagine Paris in the 1920s.
You play the role of a photographer caught up in the mystery of a surrealist artist, who has suddenly disappeared. For more on the backstory, check out my video: AI-Crafted Radio Mystery: A Surrealist Artist Vanishes in Paris.
As the player photographer you uncover clues by exploring the streets, nightclubs and art galleries of Paris. You interact with non-player characters (NPCs) to piece together the story of the missing artist. A working title for this game is Never Fear Paris. (An alternate title is Never Fear a Nightclub Dancer.)
A large language model helps bring these characters to life by creating dialogue that feels authentic. Indeed, the language model is able to shape unique conversations each time you play. Replayability is an important factor in game design. Who likes a game that can only be played once, especially if it’s in two hours?
In this lecture, we’ll explore at a high level how the language model enables realistic NPC dialogue and provides context-aware responses that shape the player's choices and narrative path.
A language model is essentially a complex probability distribution over sequences of words, determining the likelihood that certain words will follow others to form coherent sentences and responses. This probability-based structure allows it to generate text that aligns with the context, character, and setting.
Imagine the scene: You’re in a smoky Montmartre bar, seeking clues to the whereabouts of a missing artist. As you speak to the bartender, the language model calculates which words he’s most likely to say based on historical knowledge, the mystery’s tone, and how he might typically respond to questions about an artist.
First, a disclaimer: this interactive mystery is in English. In real life, our bartender would be speaking French. For the examples of this lecture, we are going to have all the dialogue in English.
Example: If our player asks, “Have you seen this artist?” the model calculates the likelihood for the bartender to respond, “I saw him with that peculiar art dealer near the Seine…” over less relevant responses.
But, already, we have a problem. A barebones large language model will respond with the most likely word to that question, which is “no”. A language model is exactly that: a model of the language. A language model has been trained on vast amounts of text to understand the syntax of language and the probability of word distributions.
Training a large language model has two stages: pre-training and post-training. You also will see this described as training (in other words, pre-training) and fine-tuning (in other words, post-training). I prefer the training and fine-tuning description and will use that going forward.
To understand how a large language model can play a performative role in our interactive mystery, let’s examine training and fine-tuning.
Natural language is sequential.
We all learn grammar in elementary school (and middle school and high school). Through studying grammar we are learning syntax: the rules of what words go in which order to form a coherent sentence. Then, in college, you learn writing essays, which is learning to craft sentences into paragraphs. A paragraph is a sequence of sentences that present a point. An essay is a sequence of paragraphs that make a persuasive argument. Writing a good essay is challenging. It forces you to think carefully about what you are trying to say.
But, even without an education, as very young children, we learn language by emulating what we hear around us.
We learn intuitively that language is a sequence of utterances that have meaning. To know a language, like English or French, means that we have internalized meaning to those sounds, which are represented as letters and words. That internalized (or learned) meaning of words are known as semantic properties.
Large language models, through the architectures of neural networks, have been trained to learn those same properties of semantics.
The types of large language models that are popular today are known as autoregressive language models.
Why autoregressive?
We know that the language model takes a sequence of words as inputs, and then generates the next predicted word. It then takes that most recently generated word and automatically returns it back into the model to generate the next word in the sequence, and so forth. The feedback loop is the essence of autoregression.
This process allows the model to generate coherent sequences word by word, with each new word prediction informed by the previously generated words, continuously building a sentence or dialogue in context.
But, we still haven’t explained why the model would respond to the question, “Have you seen this artist?” with the response, “I saw him with that peculiar art dealer near the Seine.”
Context-specific responses, like those in my interactive mystery, are achieved through fine-tuning. Fine-tuning is an additional training step where the model is adapted to a specific domain (in this case, a 1920s Parisian mystery).
Here’s how it fits into the overall training process:
Initial Training: Large Language Models are first trained on a broad corpus of diverse text, often scraped from the internet. This training process teaches the model general language patterns, syntax, and broad contextual associations.
Fine-Tuning: After this general training, the model undergoes fine-tuning to adapt its responses for a specific application, such as my interactive mystery set in 1920s Paris. During fine-tuning, the model is provided with examples of context-appropriate responses, such as conversational snippets and phrases that fit the storyline, setting, and character roles.
Fine-tuning helps the model associate questions like, “Have you seen this artist?” with historically and contextually plausible responses, such as “I saw him with that peculiar art dealer near the Seine,” instead of irrelevant or anachronistic replies. Fine-tuning helps establish the narrative tone, historical setting, and character-specific language.
During training, the LLM learns probabilities for word sequences based on vast amounts of text data. In fine-tuning, we refine this process by exposing the model to targeted examples. By fine-tuning it on curated text relevant to our interactive mystery—dialogues, scene descriptions, and possible interactions—the model learns patterns specific to our setting, enhancing its ability to generate responses that feel authentic to 1920s Paris.
The fine-tuning gives the nuance the LLM needs to deliver responses fitting the game's unique world.
Fine-tuning is a step done by the developers of a software application that is enabled by AI. For instance, ChatGPT is a software application enabled by a large language model. OpenAI, the makers of ChatGPT fine-tuned their LLM with a lot of additional instructions and information so that ChatGPT would function as a robust conversational assistant.
Likewise, if you are developing an AI-enabled app, you would fine-tuned the LLM that you are using with a lot of specific information about the context of your app.
As for my interactive mystery, remember that a video game is a software application. An LLM can be fine-tuned to function as a conversational agent within a game.
One important takeaway is that most people do not know that anyone can fine-tune an LLM to serve specific purposes.
Think about that. If you think about that deeply, you will understand the possibilities of large language models.