TL;DR
A review of Rosetta Code solutions using a GPT-4 tokenizer found large differences in token counts between languages, with a roughly 2.6x gap between the least and most token-efficient languages in the initial comparison. Dynamic and some functional languages tended to be more token-efficient; follow-up checks showed array languages using ASCII can be especially compact while symbol-heavy languages can inflate token counts.
What happened
The author compared code samples across multiple languages from a Rosetta Code mirror to see how many GPT-style tokens each language's solutions produced. Using Claude Code to orchestrate the work and the Xenova/gpt-4 tokenizer from Hugging Face, they selected a set of popular languages (19 in the primary run) and measured token counts for tasks that had solutions in every language in that selection. TypeScript was excluded because it had few entries in the dataset. The analysis found a meaningful spread: one language in the initial 19-way comparison averaged the fewest tokens while C averaged the most, producing about a 2.6x difference. Dynamic languages were generally more token-efficient; some typed functional languages (Haskell, F#) were nearly as compact. Two follow-up checks found APL’s special glyphs hurt tokenization (≈110 tokens) while J, an ASCII array language, averaged roughly 70 tokens and outperformed the others.
Why it matters
- LLMs and agent-driven coding are constrained by context window size; more token-efficient languages let agents fit more code and history into the same window.
- Lower token counts can reduce compute and memory demand per editing or synthesis session, affecting latency and cost for automated development workflows.
- Language design choices (symbol sets, verbosity, typing) can influence how effectively LLMs process code, potentially shifting toolchain and language priorities if agents take a larger role.
- Typed languages that still produce compact token footprints can combine the verification benefits of compilation with longer agent sessions.
Key facts
- Dataset: solutions were drawn from a GitHub mirror of Rosetta Code, a multi-language programming chrestomathy.
- Tokenizer: the Xenova/gpt-4 tokenizer (a community port) from Hugging Face was used to count tokens.
- Orchestration: Claude Code was used to select tasks and run comparisons across languages.
- Selection: the main comparison used 19 popular languages; TypeScript was left out due to sparse coverage in the dataset.
- Observed spread: there was approximately a 2.6x difference in average token counts between the least and most token-efficient languages in the initial run.
- Language patterns: dynamic languages tended to be more token-efficient; JavaScript was an outlier as a relatively verbose dynamic language in this sample.
- Functional languages: Haskell and F# produced token counts close to the top dynamic languages, likely helped by type inference reducing explicit declarations.
- APL update: a rerun on like-for-like tasks showed APL averaged about 110 tokens because its glyphs tokenized poorly.
- J update: the ASCII-based array language J averaged roughly 70 tokens, outperforming previously top-ranked languages in the follow-up.
What to watch next
- Improvements to tokenizers to better handle non-ASCII symbol sets (could change APL-style language rankings).
- Whether language creators or tooling communities optimize syntax or encodings specifically to reduce token counts for LLM consumption.
- Adoption of array/concise language idioms (like J) or compact encoding layers for code to extend agent context windows.
Quick glossary
- Token: A unit of text used by language models; a word, part of a word, or symbol may map to one or more tokens depending on the tokenizer.
- Tokenizer: A component that translates raw text into tokens for a language model; different tokenizers split text differently and affect token counts.
- Context window: The maximum number of tokens a model can attend to at once; it limits how much code and dialogue an agent can process in a single pass.
- Dynamic language: A programming language that typically requires fewer explicit type declarations and often resolves types at runtime.
- Type inference: A language feature where the compiler deduces types automatically, reducing the need for explicit type annotations in source code.
Reader FAQ
Which language was the most token-efficient in the original comparison?
In the initial 19-language comparison Clojure was the most token-efficient among that set; subsequent tests found J (an ASCII array language) to be even more compact.
How were tokens measured?
Token counts were produced by running Rosetta Code solutions through the Xenova/gpt-4 tokenizer from Hugging Face, with Claude Code used to coordinate the tests.
Is this a definitive, scientific ranking?
No — the author notes many biases and limits in the dataset and approach and frames the work as an exploratory look rather than a formal scientific study.
Did the analysis include TypeScript?
No — TypeScript was excluded because there were very few TypeScript tasks in the Rosetta Code mirror used.

Which programming languages are most token-efficient? January 8, 2026 · Martin Alderson I've been trying to think through what happens to programming languages and tooling if humans are increasingly no…
Sources
- Which programming languages are most token-efficient?
- A Guide to Token-Efficient Data Prep for LLM Workloads
- Most powerful LLMs (Large Language Models) in 2025
- How Programming Language Features Shape the …
Related posts
- CLI agents such as Claude Code make home server self‑hosting simpler
- What If AI Agents Were Assigned Zodiac Personalities — An LLM Experiment
- Why 2026 Could Be the Year Ordinary Users Embrace Self-hosting