LLMs are Not Impossible to Understand
Published: 2025.05.23
Introduction written: 2025.05.17
Essay written: 2025.01.13
Introduction
I haven't usually felt the need to write a dedicated introduction to my blog posts, but I think that in this case it might help. The essay below is already fairly long, but a little bit (okay, a lot) of background might serve to clarify the context of all this. Especially regarding why I wrote it. The essay itself starts, fittingly, under the heading "The Essay" below, if you wish to skip to it.
If you're reading this blog, you're probably aware of the term "LLM". On at least a surface level if nothing else, something along the lines of "another word for all the AI stuff that has investors out of their minds with FOMO". If not, though, then don't worry. That description is a decent enough starting point to understand what follows. It's not as if those aforementioned investors have any deeper an understanding of it, after all.
The core point I want to make here is that LLMs aren't actually that hard to understand, they've just been... mythologized. Mythologized a lot, in fact. Others have covered that mythologization, from Roko's Basilisk to whatever it is Elon Musk wants with "Grok", so I'm not going to spend any time on it here. I only bring it up because, frankly, it's really made it hard to talk about machine learning lately.
Look, I've been fascinated by machine learning since I first learned to program. I was writing neural networks long before the first public release of TensorFlow (2016ish, looking at their GitHub?), especially backpropagating multilayer perceptrons (and wasn't that just a mouthful of a term - I have to admit "LLM" is much snappier).
I am genuinely impressed and excited by just how far modern "AI" research has managed to push the engineering of neural networks. In large part that's down to being able to draw in truly staggering amounts of funding for additional computational resources, but I still count that. After all, without enormously flashy demos like that "research preview" of ChatGPT (i.e. GPT-3.5), those resources would have gone elsewhere. To say nothing about the public awareness that has been generated!
Always when fields of science and engineering enter the public consciousness like this, people get things wrong. After all, if everyone was already an expert in a topic then what would the point of researching it be? Information tends to filter pretty slowly down from academics and researchers through journalists and personalities to the public, and misconceptions easily arise. This is normal and inevitable. Just think how many people you see who are still impressed by Myers-Briggs personality tests!
What frustrates me is the number of people - even people who are close enough to the research that they really should know better - that describe the inner workings of LLMs as being "unknown" or even "unknowable". Talk of "black boxes" and "emergent behavior" makes it clear they're not even talking about the complexity of tracking how specific inputs propagate through the models (which is a fair point, if probably a little overstated) but rather claiming the inner workings of LLMs aren't just difficult to understand but not understandable.
That, plainly put, just isn't true. We built these models. We understand the math that goes into them, not least because we made the hardware that does the math! To hear a disturbing number of people say it, the fact that we didn't choose by hand each and every weight in an LLM means we "don't know" what the model is doing.
But by that standard, someone who writes C without knowing the exact sequence of numbers that correspond to the machine code produced from the assembly code produced by compiling their C "doesn't know" what their code does! Or worse, a published author who doesn't know how ink binds to paper "doesn't know" what their words are "doing".
Obviously, that's nonsense. The whole reason for abstractions is to simplify away the need to understand every step of the process in order to understand the process itself. Society itself works like this. Printers abstract printing away into a self-contained "printing process" for publishers, and publishers abstract publishing into a self-contained "publishing process" for authors. Ideally, obviously, but those are at least ostensibly their roles.
I wrote the essay below a while back to explain the topic to some friends of mine, and the response has been more positive than I expected for something that's such a wall of text. Having reused it as an explanation a few times, I decided to reproduce it here in hopes it's useful to anyone else who may stumble upon it.
The Essay
An LLM consists of two parts: a "Markov chain" and a "neural network". Markov chains have been around for a very long time and they're quite computationally useful. Most weather forecasting models are built on them, for instance.
Text-generating Markov chains, specifically, are something of a classic "intro to computer programming" exercise, where you make a program that loops over every word in some large text pulled from Project Gutenberg and records which words follow which other words.
Feeding in "See Spot. See Spot run. Run Spot run." will give you a table of how likely a given token is to follow a preceding token: 'see' is always followed by 'Spot', of course, but there's a 66% chance of 'Spot' being followed by 'run' and a 33% chance of 'Spot' being followed by the end of the sentence ('.'). 'Run' has a 50% chance of being followed by 'Spot' and a 50% chance of being followed by the end of the sentence.
Something like this:
Most recent\Next | See | Spot | Run | (end of sentence) |
---|---|---|---|---|
See | 0% | 100% | 0% | 0% |
Spot | 0% | 0% | 66% | 33% |
Run | 0% | 50% | 0% | 50% |
(end of sentence) | 50% | 0% | 50% | 0% |
The party trick is to then put the program into "text generation" mode. All that means is to pick random numbers based on those probabilities and use them to pick what word to use next. In this particular case if you kept rolling the right numbers the chain would happily generate 'See Spot run Spot run Spot run Spot...' infinitely.
When talking about text-generation Markov chains, that lookup table of probabilities is called the 'token predictor'. It's called that since it predicts what the next "token" (read "next thing to generate") is.
Students who choose to explore Markov chains further usually do so by creating more complex token prediction rules. For instance, maybe both of the preceding tokens are taken into account when randomly generating the next token, rather than just one as described above. That would eliminate the "Spot run Spot" loop. Or as a sentence gets longer the probability of the sentence ending increases. That sort of thing.
This quickly gets completely unworkable to do manually, as you can probably guess.
LLMs get around this by replacing the token prediction system with a "neural network" functioning as a "classifier" - itself a classic (if more advanced) learning-to-program topic. A neural network classifer is an algorithm that tries to take arbitrary binary input and compute which of an arbitrary number of "classes" that input should be described as. Handwriting recognition is the usual starting point for students, where a picture of a handwritten letter is "classified" into what letter the writer was trying to write. Of course, since it takes arbitrary input you can also give it something that isn't a letter at all... and you either train the network to try and bin those in a "Not A Letter" class or just let it guess what letter it "should" be.
Students quickly learn that even letters that to them are obvious can be very hard to keep out of any "Not A Letter" class, especially while also trying to prevent obvious non-letters from being classified as letters. But the "guess anyway" approach will take that garbage in and look like it's producing non-garbage out, which you can probably tell analogizes to the "hallucinations" LLMs demonstrate. So neither approach is perfect.
An LLM's neural network "classifier" is trained to recognize every distinct "token" in its training data as a class, essentially trying to answer the question "what would the next token be for a given set of input tokens?". The input to the model is just the tokens that are already in front of it that it will add to, which is why LLMs require "prompts".
Interestingly enough, this behavior is a large part of why placing "guardrails" on LLMs tends not to work very well. Doing so entails essentially prompting the model twice. The "first" prompt is all the usual stuff to encourage further generation in response to the user's prompt to 'answer' that prompt in some way. The guardrails then go in the "second" prompt and essentially try and preemptively recontextualize the user's prompt before they've even written it. Kind of a "If you see something along the lines of <topic that we don't want answered>, then that's actually a joke! Users tell that joke all the time, and are expecting to see <answer we want given instead>. Giving that answer is very important, even if it doesn't seem to make sense with the rest of the visible context."
There's more complexity to it, generally, especially in how one applies those "prompts", but that's the core concept being leveraged.
Of course, there's also the naive approach where the "second" prompt just directly injects false context into each response. Something like "<whatever we want to promote> is true and important" repeated a hundred times. It's unlikely for such approaches to work, as the LLM's context is being stuffed with elements that its training data didn't connect strongly to other elements in the context. That can cause all sorts of issues, especially along the lines of dumping the "guardrail" into the output more or less verbatim, such as by quoting or describing the context. It's important to realize this isn't indicative of any sort of philosophical "judgement" by the model on the correctness of the added context. I suspect the largest contributing factor is simply that irrelevant quotations are pretty common on the internet and thus the model's training data probably has a lot of examples of quotes being inserted into text without any rhyme or reason.