Recursive Language Model (RLM)

NLP

An inference approach that lets an LLM programmatically examine, decompose, and recursively call itself over snippets of extremely long input, handling contexts up to 100x beyond native window limits.

A Recursive Language Model (RLM) is an inference strategy introduced by Alex Zhang, Tim Kraska, and Omar Khattab in a December 2025 paper. Instead of forcing a language model to ingest an entire long document in one pass, RLMs treat the input as an external environment accessible through a persistent Python REPL. The model can view, slice, search, and filter the data, then recursively call itself on smaller pieces to build up an answer.

The architecture works by loading the full input into a variable within the REPL. A root LLM inspects the data programmatically, decides how to partition it, and launches sub-LLM calls on each partition. These sub-calls can themselves recurse further, creating a tree of focused queries. An answer variable is iteratively refined until the model marks it as ready. Sub-LLM calls can be parallelized for efficiency.

RLMs enable processing of inputs up to two orders of magnitude beyond a model's native context window, covering entire codebases, multi-year document archives, and book-length texts. The fine-tuned RLM-Qwen3-8B model outperforms its base Qwen3-8B by 28.3% on average across long-context benchmarks and approaches the quality of vanilla GPT-5 on three long-context tasks at comparable cost.

References & Resources

Last updated: February 24, 2026

Recursive Language Model (RLM)

References & Resources

Related Terms