Sampling Language Models at Negative Temperature: Experiments with LLaMA

TL;DR

Researchers ran Meta's LLaMA locally with a negative sampling temperature (T = -0.001) by modifying llama.cpp to bypass a greedy-sampling guard. Negative temperature flips token probability ordering, producing deterministic outputs of tokens that are normally least likely and yielding odd, repetitive, centroid-like sequences.

What happened

The experiment applied the statistical-mechanics idea of negative absolute temperature to neural-language-model sampling. Since the final softmax in a model is mathematically equivalent to a Boltzmann distribution, setting temperature below zero inverts the exponent sign and makes previously unlikely tokens more probable; as T approaches zero from the negative side the model tends toward a deterministic sequence of the least-likely tokens. OpenAI-hosted models block temperatures outside [0.0, 2.0], so the author ran Meta's LLaMA locally via llama.cpp. They altered a source check that forced greedy sampling at nonpositive temperatures, recompiled, and disabled repetition penalty, top-k, and top-p. With T = -0.001 the models produced unusual outputs: short, odd completions for the 7B run and long, looping streams of anomalous tokens for 13B. Many repeated tokens (for example, Хронологија and entferne variants) are reported to lie near the centroid of LLaMA's embedding space, correlating with the negative-temperature behavior.

Why it matters

It exposes a mathematical symmetry between neural softmax sampling and Boltzmann statistics and demonstrates an unconventional sampling regime.
Negative-temperature sampling can force a model to output tokens it would normally avoid, revealing distributional blind spots and anomalous tokens.
The behavior highlights how implementation guards (like temperature checks) and decoding heuristics shape practical model outputs.
Observing repeated centroid-like tokens suggests embedding-space geometry influences pathological generation under extreme sampling settings.

Key facts

Temperature in softmax sampling mirrors the Boltzmann distribution used in statistical mechanics.
If T < 0 the sign of the exponent in the softmax flips, so least-likely tokens at positive T become most likely at negative T.
OpenAI models enforce temperatures between 0.0 and 2.0, preventing negative-temperature experiments on those hosted APIs.
The author used llama.cpp with Meta's LLaMA models run locally and changed a conditional that previously forced greedy sampling for temp <= 0 to allow negative values.
Command-line flags used included –temp -0.001, –repeat-penalty 1.0, –top-k 0, –top-p 1.0 and a short prompt: "Temperature is a concept".
With LLaMA-7B at T=0.001 the model produced expected, coherent completions; at T=-0.001 it produced a brief token (Хронологија) then stalled.
With LLaMA-13B at T=-0.001 the model generated long, repetitive streams containing tokens like Хронологија and Entferne variants, rather than typical coherent text.
A commenter (scottviteri) noted that many repeated anomalous tokens are near the centroid in LLaMA's embedding space, implying low semantic certainty for those tokens.

What to watch next

Whether similar negative-temperature effects occur across other architectures and tokenizers: not confirmed in the source.
How tokens near the embedding centroid behave under other extreme decoding schemes (confirmed as a reported observation in this experiment).
Whether model providers will change API restrictions or add explicit guards for negative-temperature sampling: not confirmed in the source.

Quick glossary

Softmax (inference): A function that converts model logits into a probability distribution over possible next tokens by exponentiating and normalizing scores.
Temperature (sampling): A scalar applied to logits before softmax that controls randomness: lower positive values sharpen the distribution, higher values flatten it.
Negative temperature: In this modeling context, a temperature value below zero inverts the softmax exponent, favoring tokens that are least likely at positive temperatures.
Embedding centroid: The central point in a model's token-embedding space; tokens near the centroid tend to have less distinct semantic representation.

Reader FAQ

What does negative temperature do to a language model's output?
It inverts the usual ranking of token probabilities so tokens that are unlikely under normal sampling become favored; as T→0- the model tends toward deterministic selection of those least-likely tokens.

Was llama.cpp modified to allow negative temperatures?
Yes. The author changed a condition that previously forced greedy sampling for nonpositive temperatures so negative values could be passed through.

Can this be done on OpenAI's hosted models?
No. The source states OpenAI's models only accept temperatures between 0.0 and 2.0.

Does this reveal safety or security risks?
Not confirmed in the source.

Summary: Inspired by the definition of temperature in statistical mechanics and the possibility for it to be below zero, we try sampling LLaMA at 𝑇 = − 0.001 T=−0.001. The…

Sampling Language Models at Negative Temperature: Experiments with LLaMA

By

TL;DR

What happened

Why it matters

Key facts

What to watch next

Quick glossary

Reader FAQ

Sources

Related posts

By

Related Post

Why 2026 Could Be the Year Ordinary Users Embrace Self-hosting

AI Agents and the Next Two Years of Software Engineering by 2026

Using FUSE to Give Agents Broad, Sandboxable Access via Filesystems

Leave a Reply Cancel reply

You missed

Why 2026 Could Be the Year Ordinary Users Embrace Self-hosting

iMessage-kit: Type-safe macOS SDK for reading, sending and automating iMessage

Google co-founders Sergey Brin and Larry Page may be exiting California

AI Agents and the Next Two Years of Software Engineering by 2026