TL;DR
Researchers propose Recursive Language Models (RLMs), an inference-time strategy that treats very long prompts as an external environment and lets an LLM programmatically inspect, break up, and recursively call itself on prompt snippets. The paper reports that RLMs handle inputs up to two orders of magnitude beyond model context windows and outperform base LLMs and common long-context scaffolds on four long-context tasks, with comparable or lower cost per query.
What happened
A team led by Alex L. Zhang, with Tim Kraska and Omar Khattab, submitted a paper describing Recursive Language Models (RLMs). The approach reframes long prompts as an external environment that the model can query programmatically: the LLM examines and decomposes the prompt and recursively invokes itself on smaller snippets. According to the paper, this inference-time strategy enables handling inputs that exceed standard context windows by as much as two orders of magnitude. The authors report that RLMs not only extend effective input length but also, for shorter prompts, substantially improve output quality relative to base LLMs and several commonly used long-context scaffolds across four diverse long-context tasks. The paper notes that RLMs achieve these gains while keeping per-query inference cost comparable to, or lower than, alternatives. The submission is archived on arXiv (arXiv:2512.24601), submitted 31 Dec 2025; the main text is nine pages with a 33-page appendix.
Why it matters
- Extends practical LLM input handling far beyond native context windows, addressing a major limitation for long-document tasks.
- Reports quality improvements even on shorter prompts, suggesting the method affects more than just input length.
- Claims comparable or reduced inference cost per query versus existing long-context approaches, relevant for deployment trade-offs.
- Presents a programmatic, inference-time strategy that can be applied without changing base model training (per the paper’s framing).
Key facts
- Paper title: "Recursive Language Models" by Alex L. Zhang, Tim Kraska, and Omar Khattab.
- Archived on arXiv as arXiv:2512.24601; submitted 31 Dec 2025.
- Core idea: treat long prompts as an external environment and let the LLM examine, decompose, and recursively call itself on prompt snippets.
- Reported capability: handles inputs up to two orders of magnitude beyond model context windows.
- Evaluation: RLMs outperform base LLMs and common long-context scaffolds across four diverse long-context tasks (task names not listed in the source).
- Cost: authors state RLMs have comparable or cheaper inference cost per query relative to alternatives.
- Document length: 9 pages for main text, 33 pages including appendix.
- Subjects listed: Artificial Intelligence (cs.AI) and Computation and Language (cs.CL).
- DOI landing: https://doi.org/10.48550/arXiv.2512.24601.
What to watch next
- Whether the authors release code, models, or reproducible benchmarks: not confirmed in the source.
- Details on the four long-context tasks used for evaluation and their benchmark metrics: not confirmed in the source.
- How RLMs perform with different base model architectures and sizes in independent reproductions: not confirmed in the source.
Quick glossary
- Large Language Model (LLM): A neural network trained on large text corpora to generate or analyze natural language; often used for tasks like generation, summarization, and question answering.
- Context window: The maximum length of input text that a language model can consider at once during inference.
- Inference-time scaling: Techniques applied at model inference (not training) to change how a model processes inputs, often to handle larger or more complex queries.
- Recursive algorithm: A method that solves a problem by reducing it into smaller instances of the same problem and calling itself on those instances.
- Prompt decomposition: Breaking a long input or instruction into smaller, more manageable pieces for sequential or parallel processing by a model.
Reader FAQ
What are Recursive Language Models (RLMs)?
RLMs are an inference-time strategy that treat long prompts as an external environment, enabling an LLM to inspect, decompose, and recursively call itself on parts of the prompt.
How much longer can RLMs handle compared with standard context windows?
The paper reports handling inputs up to two orders of magnitude beyond model context windows.
Which specific long-context tasks were used to evaluate RLMs?
Not confirmed in the source.
Do the authors provide code or model checkpoints?
Not confirmed in the source.
Computer Science > Artificial Intelligence [Submitted on 31 Dec 2025] Recursive Language Models Alex L. Zhang, Tim Kraska, Omar Khattab We study allowing large language models (LLMs) to process arbitrarily…
Sources
Related posts
- How Gemini Turns Google Keep From Junk Drawer Into Productivity Engine
- Why NotebookLM changed note-taking: an AI thinking tool for research
- Ditch the Keyboard: Handy Brings Free AI Speech-to-Text to Users