Executable Language

The Runtime Mechanics of Large Language Models

May 16, 2026

For most of history, language lived in human minds. Writing moved it onto paper. Digital systems pushed it into memory and across networks. But every prior medium just stored language. Running it still needed a brain.

That changed when language itself got encoded in the parameters of a large language model and became executable.

Most explanations of LLMs tend to drift in one of two directions. In the first, the model gets treated like a hidden mind — a system with beliefs, intentions, goals, perhaps even an inner self waiting to be uncovered. In the second, the discussion collapses into technical fragments: embeddings, attention heads, gradient descent, scaling laws; useful pieces, but rarely assembled into a clear operational picture.

What I want to do here is simpler: describe the runtime mechanics plainly, in the order they actually happen. Not to reduce the strangeness of the system, but to make the strangeness easier to see.

LLMs aren’t minds that use language. They’re functions that operate on it.

The Function

Suppose you have a large language model. Maybe it’s running on your laptop, maybe in a data center somewhere. Skip the question of where it came from and how it was trained for now. You have it, you’re about to run it. What happens next?

Something surprisingly simple.

At its core, a large language model is a mathematical function. You give it text, and it returns a list of probabilities for what the next token should be. A token is roughly a word or piece of a word. The distinction matters technically, but for this discussion we can use the terms interchangeably.

So: prompt goes in, a list of probabilities for possible next words comes out. Not a sentence, not an answer, just probabilities assigned to possible continuations. Most receive effectively zero probability but a handful receive meaningful probability.

That is the whole job of the core function.

Think of sin(x). You give it an angle, it returns a number. The formula is fixed. No state, no interpretation, no memory of previous calls. Run it a thousand times with the same input and you’ll get the same number a thousand times.

The core of an LLM works the same way in kind. You give it text, it returns a probability list. The output is fully determined by the input. Same text in, same probability list out. There’s no interpretation, no intention, no inferring what the user “really meant.” The function just evaluates.

The difference between sin(x) and an LLM isn’t structural. It’s scale. Sin(x) can be implemented with a compact algorithm and a handful of constants. An LLM is defined by billions of numerical parameters. Those parameters specify the function, and they don’t change as you call it, so the function itself doesn’t change either. It needs no internet connection, no live data, nothing external. Just numerical parameters and an algorithm for using them. Self-contained and static.

Inside, the function is highly nonlinear. Evaluating it requires massive parallel computation. Nonlinear systems are hard to predict in much simpler cases than this. Here, it’s a nonlinear function with billions of interacting degrees of freedom. And though the internal algorithm is well-defined, the capabilities that emerge from running it are far from obvious. There’s also a strange asymmetry that reminds me of how brains work: the function produces language one token at a time, but the computation producing each token is deeply parallel.

The function has no memory. Each call is independent of every previous call. Nothing carries over. The function doesn’t know it has been called before, doesn’t accumulate experience, doesn’t update itself during use.

Sin(x) doesn’t remember the last angle it processed. The core LLM function doesn’t remember the last prompt it ran on. The state lives in the input, not in the function.

What’s actually in those parameters? They encode statistical regularities of the language the model was trained on. They don’t contain sentences or documents in any recoverable form. The model isn’t pulling output from a database. It is not retrieving. It is generating.

Treated as a black box, the core function has the same character as sin(x). Input goes in. A deterministic output comes out.

A quick note. What I’ve described so far is what I’ll call the core function of an LLM — the math object that takes text in and returns a probability list. In casual usage people say “the LLM” to mean the whole runtime system: the function plus the machinery around it that produces output language — and that machinery is what the rest of this essay describes.

The Token

So the function has run. A list of next-token probabilities is sitting at the output. The system now has to choose a token from that list. The selection process is called sampling, and it includes a parameter called temperature.

At temperature zero, the system always selects the most probable next token. Identical inputs then produce identical outputs, word for word. This produces highly stable and deterministic behavior, often useful for analytical tasks, factual reasoning, or situations where minimizing ambiguity matters.

Increasing the temperature loosens the selection rule, allowing lower-probability tokens to occasionally enter the sequence. Randomness enters the process during this sampling step, not in the core function itself. The output becomes less deterministic and more varied. Higher temperatures are often better suited for conversational language, creative writing, brainstorming, or situations where flexibility and exploration matter more than strict reproducibility.

The important distinction is that the function itself never changes. The function computes probabilities. Sampling, which sits outside the core function, determines how those probabilities are converted into actual text.

The Sentence

You have a token. To get a sentence, you repeat.

Input goes in, a probability list comes out, a word is sampled. The word is appended to the input, and the whole extended text goes back into the function. A new list comes out. Another word is sampled. The process keeps going until a sentence emerges.

Take “Why is the sky blue?” The function evaluates that text and assigns probabilities to possible next words. “Because” receives a high probability. It gets selected and appended. The input is now “Why is the sky blue? Because.” The function evaluates the extended text again. “Sunlight” now receives a high probability. It gets selected and appended. The input becomes “Why is the sky blue? Because sunlight.” And the loop continues.

Every new word is conditioned on the full accumulated text: the original prompt, plus everything generated so far. Coherence comes out of that conditioning and the algorithm itself. The model doesn’t hold the complete answer at the start and then transcribe it. The answer doesn’t exist until the loop builds it.

Linguistic thought feels serialized because language itself unfolds serially, step by step, each word constraining what comes next. The model’s generation has the same sequential structure. Not because the model thinks, but because both processes operate in the same medium.

The Conversation

The core function is stateless. It doesn’t remember the token it just generated, and it doesn’t remember the sentence it just produced. For the model to “chat” with you, the entire conversation, your prompts and its replies, has to be re-fed at every turn.

The loop is the same as the sentence loop, just at the message level. User messages and model replies get appended into a running transcript. At each new turn, the function receives the whole accumulated transcript as its input. That’s where the appearance of memory and continuity comes from. The model isn’t storing anything between calls. It reads the full history fresh each time, as text, and generates the next token from it.

A conversation with an LLM isn’t a dialogue in the way two people have a dialogue. It’s a continuously growing text document, evaluated one token at a time by the same stateless function. Coherence comes from the structure of the accumulated text and the algorithm. Not from any internal state the model is maintaining.

The model doesn’t remember. It re-reads.

That’s the whole loop. A function with no memory. A sampling process with randomness. Parameters that do not store sentences or facts in any recoverable form. None of those pieces contains an answer on its own.

And yet you give the LLM a prompt, and an answer appears. Sometimes a hallucination.

Humans probably hallucinate more. Put a random person in front of a camera and ask a confident question about an unfamiliar topic. Under that kind of pressure, many generate explanations that sound remarkably coherent while being completely detached from reality.

Still, think about what is happening here. A mathematical function assigns probabilities to possible next words. Sampling, with some randomness, selects one. That word gets appended to the input text. The process repeats. Word after word, the loop unfolds until eventually an explanation appears, or a poem, or a computer program, or an answer to a question.

At first glance, it is not obvious why this process should work at all. The deeper question is not why hallucinations occur, but why coherent reasoning emerges in the first place.

But it does.

The structure of language became encoded into a static nonlinear mathematical function with billions of parameters. Not as a database of stored sentences or retrievable documents, but as the function itself.

How that transfer happens — how language becomes executable — is the next piece of the story.

Audrius Berzanskis

Discussion about this post

Ready for more?