Recursive Language Models for Arbitrarily Long Prompts and Inference Scaling

TL;DR

Researchers propose Recursive Language Models (RLMs), an inference-time strategy that treats very long prompts as an external environment and lets an LLM programmatically inspect, break up, and recursively call itself on prompt snippets. The paper reports that RLMs handle inputs up to two orders of magnitude beyond model context windows and outperform base LLMs and common long-context scaffolds on four long-context tasks, with comparable or lower cost per query.

What happened

A team led by Alex L. Zhang, with Tim Kraska and Omar Khattab, submitted a paper describing Recursive Language Models (RLMs). The approach reframes long prompts as an external environment that the model can query programmatically: the LLM examines and decomposes the prompt and recursively invokes itself on smaller snippets. According to the paper, this inference-time strategy enables handling inputs that exceed standard context windows by as much as two orders of magnitude. The authors report that RLMs not only extend effective input length but also, for shorter prompts, substantially improve output quality relative to base LLMs and several commonly used long-context scaffolds across four diverse long-context tasks. The paper notes that RLMs achieve these gains while keeping per-query inference cost comparable to, or lower than, alternatives. The submission is archived on arXiv (arXiv:2512.24601), submitted 31 Dec 2025; the main text is nine pages with a 33-page appendix.

Why it matters

Extends practical LLM input handling far beyond native context windows, addressing a major limitation for long-document tasks.
Reports quality improvements even on shorter prompts, suggesting the method affects more than just input length.
Claims comparable or reduced inference cost per query versus existing long-context approaches, relevant for deployment trade-offs.
Presents a programmatic, inference-time strategy that can be applied without changing base model training (per the paper’s framing).

Key facts

Paper title: "Recursive Language Models" by Alex L. Zhang, Tim Kraska, and Omar Khattab.
Archived on arXiv as arXiv:2512.24601; submitted 31 Dec 2025.
Core idea: treat long prompts as an external environment and let the LLM examine, decompose, and recursively call itself on prompt snippets.
Reported capability: handles inputs up to two orders of magnitude beyond model context windows.
Evaluation: RLMs outperform base LLMs and common long-context scaffolds across four diverse long-context tasks (task names not listed in the source).
Cost: authors state RLMs have comparable or cheaper inference cost per query relative to alternatives.
Document length: 9 pages for main text, 33 pages including appendix.
Subjects listed: Artificial Intelligence (cs.AI) and Computation and Language (cs.CL).
DOI landing: https://doi.org/10.48550/arXiv.2512.24601.

What to watch next

Whether the authors release code, models, or reproducible benchmarks: not confirmed in the source.
Details on the four long-context tasks used for evaluation and their benchmark metrics: not confirmed in the source.
How RLMs perform with different base model architectures and sizes in independent reproductions: not confirmed in the source.

Quick glossary

Large Language Model (LLM): A neural network trained on large text corpora to generate or analyze natural language; often used for tasks like generation, summarization, and question answering.
Context window: The maximum length of input text that a language model can consider at once during inference.
Inference-time scaling: Techniques applied at model inference (not training) to change how a model processes inputs, often to handle larger or more complex queries.
Recursive algorithm: A method that solves a problem by reducing it into smaller instances of the same problem and calling itself on those instances.
Prompt decomposition: Breaking a long input or instruction into smaller, more manageable pieces for sequential or parallel processing by a model.

Reader FAQ

What are Recursive Language Models (RLMs)?
RLMs are an inference-time strategy that treat long prompts as an external environment, enabling an LLM to inspect, decompose, and recursively call itself on parts of the prompt.

How much longer can RLMs handle compared with standard context windows?
The paper reports handling inputs up to two orders of magnitude beyond model context windows.

Which specific long-context tasks were used to evaluate RLMs?
Not confirmed in the source.

Do the authors provide code or model checkpoints?
Not confirmed in the source.

Computer Science > Artificial Intelligence [Submitted on 31 Dec 2025] Recursive Language Models Alex L. Zhang, Tim Kraska, Omar Khattab We study allowing large language models (LLMs) to process arbitrarily…

Sources

Recursive Language Models

Recursive Language Models for Arbitrarily Long Prompts and Inference Scaling

By

TL;DR

What happened

Why it matters

Key facts

What to watch next

Quick glossary

Reader FAQ

Sources

Related posts

By

Related Post

The waning era of scale-only AI: why scaling’s grip is weakening

McKinsey and General Catalyst: the ‘learn once, work forever’ era is over

Lenovo unveils Qira, a cross-device AI assistant for laptops and phones

Leave a Reply Cancel reply

You missed

SMTP Tunnel: A SOCKS5 proxy that masks TCP as SMTP to bypass DPI

Recreated: Steve Jobs’s 1975 Atari horoscope program — now runnable

Google to publish AOSP source twice yearly, a setback for custom ROMs

Transform your phone into a true productivity workhorse with a USB-C hub