Uncertainty → Certainty: The Time-First Mechanics of Generation

The move we’re expanding

From the temporal angle, generation is a conversion of uncertainty into certainty:

Before the next token is chosen, there’s a cloud of possibilities — a probability distribution.
After selection, that possibility collapses into a fixed token, becoming part of the “past” the model conditions on for the next step.

That’s not metaphor. That’s literally how autoregressive transformers run. They generate one token at a time, conditioning each new prediction on the whole preceding sequence. (pmdartus)

This article is just that idea, pulled tight until you can feel it as mechanics, not poetry.

1) Autoregression is a time process, not a spatial one

A GPT-style model is decoder-only and autoregressive: it doesn’t output a whole answer at once. It outputs a next token, then uses that token as context, then outputs the next, and so on. (pmdartus)

So the native act of the model is a repeated loop:

state at time t → predict distribution for t+1 → pick a token → update state → repeat

Spatial metaphors (landscapes, basins) are our way of describing what this loop tends to do. But the computation itself is time-indexed selection.

2) “Cloud of possibilities” = logits → softmax

At each step, the model produces a vector of logits: raw scores for every possible next token. (Medium)

Those logits aren’t probabilities yet. They can be any real numbers.

Then we apply softmax, which turns logits into a proper probability distribution: every value between 0 and 1, summing to 1. (Nigel **Gebodh**)

That softmaxed vector is the cloud. It’s the model saying:

“Given everything so far, here’s how likely each future is.”

3) “Collapse” = decoding choice

Once you have the distribution, you need a rule for selecting the next token. Common decoding regimes:

Greedy / argmax (pick the highest-probability token)
Sampling (choose stochastically by probability)
Top-k / nucleus (top-p) sampling (sample, but only from the high-probability tail) (neptune.ai)

No matter which regime you use, the moment a token is selected, uncertainty collapses into commitment. The model can’t “unchoose” it. It becomes fixed context for the next step.

That’s the temporal arrow you’re after:

future-like distribution → past-like fact

4) Attention is the same story, one level deeper

People sometimes think attention is spatial. It’s not. It’s another time step inside the time step.

For each new token prediction, the model computes attention scores over prior tokens, then softmaxes those scores into attention weights between 0 and 1. (Wikipedia)

Interpretation:

Before softmax: multiple possible “stories” of what should matter right now.
After softmax: a normalized present-moment weighting of the past.

So even inside the machinery, “what matters” is resolved temporally as a probability distribution, not as a fixed geometric relation.

5) Temperature is a dial on uncertainty, not “creativity”

Temperature is implemented by scaling logits before softmax. (iaee.substack.com)

Low temperature sharpens the distribution: the cloud collapses faster toward the most likely future.
High temperature flattens it: more futures remain viable longer. (vinija.ai)

So in time-first terms, temperature is:

a control on how quickly uncertainty is allowed to resolve at each step.

That’s why raising temperature increases exploration and surprise — the model keeps more futures “alive” long enough to step into less-traveled trajectories.

6) RAG and pre-prompts are probability-world edits

Think of RAG, system prompts, and guardrails as edits to the probability world before the collapse happens.

RAG injects new context so the distribution shifts toward futures compatible with retrieved material. (neptune.ai)
Pre-prompts define initial conditions and bias which futures are “natural” from step one. (pmdartus)
Guardrails suppress certain futures by pushing their probabilities toward zero. (CodeSignal)

None of these remove the temporal logic. They just reshape the cloud that will still collapse by generation.

So you can say it cleanly:

They don’t change what a generator is.
They change what futures are likely inside the generator’s time-evolution.

7) Why GenAI isn’t naturally a factuality engine

Now the payoff.

A factuality engine wants this property:

“When the world is uncertain, don’t commit.”

But a generator’s job is the opposite:

“When the world is uncertain, commit to the most plausible continuation.”

So hallucination is not an accident. It is the expected output of a system whose native act is uncertainty-collapse under pattern priors.

RAG can shift the cloud toward real evidence, and often helps.
But it doesn’t replace the collapse with verification.
It just changes which futures are easiest to collapse into.

That’s why factuality remains a human lane.

8) The advanced mental move

Keep your topology intuition — it’s useful.
But recognize it as your visualization of a time process.

The model doesn’t roam a landscape as its lived reality.
It iteratively resolves uncertainty through softmax and selection.

So the deepest way to hold an LLM is:

It makes a probability cloud.
Something collapses.
The collapse becomes history.
History conditions the next cloud.
Repeat until the answer is complete.

That’s the entire organism.

And once you see that, everything else — temperature, RAG, prompts, guardrails — becomes what it really is: ways of shaping the cloud, not ways of turning the cloud into a database.

Uncertainty → Certainty: The Time-First Mechanics of Generation

The move we’re expanding

1) Autoregression is a time process, not a spatial one

2) “Cloud of possibilities” = logits → softmax

3) “Collapse” = decoding choice

4) Attention is the same story, one level deeper

5) Temperature is a dial on uncertainty, not “creativity”

6) RAG and pre-prompts are probability-world edits

7) Why GenAI isn’t naturally a factuality engine

8) The advanced mental move

Like this:

Leave a ReplyCancel reply

The move we’re expanding

1) Autoregression is a time process, not a spatial one

2) “Cloud of possibilities” = logits → softmax

3) “Collapse” = decoding choice

4) Attention is the same story, one level deeper

5) Temperature is a dial on uncertainty, not “creativity”

6) RAG and pre-prompts are probability-world edits

7) Why GenAI isn’t naturally a factuality engine

8) The advanced mental move

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from John Rector