Breaking the Context Ceiling: Implementing Recursive Language Models with LangGraph and TypeScript

This ad is not shown to multipass and full ticket holders
JS Nation
JSNation 2026
June 11 - 15, 2026
Amsterdam & Online
The main JavaScript conference of the year
Upcoming event
JSNation 2026
JSNation 2026
June 11 - 15, 2026. Amsterdam & Online
Learn more
Bookmark
Rate this content
Sentry
Promoted
Code breaks, fix it faster

Crashes, slowdowns, regressions in prod. Seer by Sentry unifies traces, replays, errors, profiles to find root causes fast.

MIT's recent ""Recursive Language Models"" paper demonstrated that LLMs can process inputs 100x beyond their context windows — not by expanding the window, but by treating prompts as external environments the model explores programmatically. The results are striking: GPT-5-mini outperformed GPT-5 on long-context tasks while using comparable compute. This talk demonstrates how to build the same architecture in TypeScript using LangGraph and Node.js.

We'll implement an RLM system in which a root agent orchestrates recursive sub-agents, each operating on a focused context slice without suffering ""context rot."" We'll see how to leverage LangGraph's cyclic graph execution to spawn child agents, aggregate their findings into a shared state, and let the orchestrator synthesize results — all while keeping individual context windows small and fresh.

By the end, you'll have a working pattern for processing massive documents, codebases, or datasets that would choke a single LLM call, using tools you can deploy today.

Key takeaways:

  • - Why bigger context windows don't solve context rot
  • - Architecting recursive agent graphs in LangGraph
  • - Managing state and tool execution across agent hierarchies
  • - Cost and latency tradeoffs in production

This talk has been presented at Node Congress 2026, check out the latest edition of this JavaScript Conference.

FAQ

The speaker is Jamal Legaro, a senior software engineer at Netflix.

The main topic is 'Breaking the Context Ceiling, Recursive Language Models in TypeScript', focusing on processing documents larger than a language model's context window using recursive language models.

The problem is that performance degrades as the context window fills up, leading to a loss-in-the-middle effect where information in the middle of the context is often forgotten or lost.

A recursive language model is an approach that decomposes text into smaller pieces and uses an orchestrator model to process these pieces, improving efficiency and context handling over traditional methods.

The RLM approach keeps the context window small by passing back summaries and metadata, whereas the standard scaffold approach fills the context window with the entire prompt or document, leading to potential performance loss.

Applications include processing large API documentation, debugging large code bases, and analyzing multiple technical specifications or incident reports.

It is considered more efficient because it reduces the context window size and computational cost by processing smaller text chunks and returning summaries, avoiding the need to handle large documents all at once.

Langraph is a library that helps create execution graphs with nodes and edges, facilitating the implementation of recursive language models by managing state and execution flow.

Langsmith is a tool that allows tracing and monitoring of the execution, helping to visualize the flow and track costs and actions of language models.

Considerations include managing costs, implementing timeouts, preventing infinite loops, handling rate limiting, and ensuring fallback mechanisms are in place.

Jamal Sinclair O'Garro
Jamal Sinclair O'Garro
21 min
26 Mar, 2026

Comments

Sign in or register to post your comment.
Video Summary and Transcription
Jamal Legaro discusses challenges with context windows in language models and recursive language models as a solution. Enhancing language model performance through metadata and summaries, orchestrator copies, and comparing scaffold and RLM approaches. Efficient management of language model operations with slicing, recursion, code execution, and context window handling. Graph-based workflow design in Langchain for agent connections and state management. Workflow management, document analysis, and using RLM for processing large documents. Detailed workflow setup, code structure with Langsmith, and node definitions with RLM benefits for cost considerations and production readiness.

1. Challenges with Context Windows in Language Models

Short description:

Jamal Legaro, software engineer at Netflix, discusses the challenges with context windows in language models. Large models face performance degradation as tokens increase, leading to a loss-in-the-middle effect. Recursive language models offer a solution by decomposing text into smaller parts, enabling better context retention and decision-making.

Hello Node Congress. My name is Jamal Legaro. I'm a senior software engineer at Netflix, working in the experimentation platform, and welcome to my talk, Breaking the Context Ceiling, Recursive Language Models in TypeScript, or How to Process Documents 100 Times Larger than Your LOM's Context Window Using LangRep. Let's talk more about the problem with context windows. If you use large language models in tools like Claw, Gemini, Claw to Code, you may notice that when you're picking your models, they typically have a limit to their context window, usually somewhere between 128,000 to 1 million tokens. But one thing they don't tell you is that the performance actually is going to degrade as you start to go through those tokens. There's a loss-in-the-middle effect that basically says that as you start to fill up that context window, and you get closer to the limit, your model can only efficiently remember what's at the beginning or at the end of the context. And everything in the middle basically gets lost or forgotten, or it's like more difficult to find. And what that does is causes a degradation in the actual performance of your large language model itself. So, some examples of where this can happen is say if you have API documentation, I'm thinking like multiple docs across multiple applications, and you're trying to piece together some information to figure out how the system works in a larger fashion, or you're trying to find some type of breaking changes in the many release notes, doing some debugging or triaging. And because of this loss-in-the-middle, you'll probably miss the changes you're looking for, because the context is too large. This also happens for very large code bases. Imagine if you had like tons of documents of like incident postmortems that you're trying to go through, and then many technical specs. The idea is that the larger the text, the more this becomes an issue. So, one way to get around this is to use something called a recursive language model. And this comes from some students and researchers at MIT. And then the idea is pretty simple. What they say is, instead of putting the entire document into your prompt to like bloat that context, or pretty much take up a lot of your context, what you can do is you can recursively decompose that text into smaller pieces, and then have an orchestrator model write deterministic code, and have sub, I'll say sub, let's say spawns of itself or copies of itself, perform that text search operation. And then we turn back some metadata and a condensed version of the text that the larger orchestrator can use to actually make a decision. So, before we dig deeper into that, it's probably a mouthful, probably a lot to unpack, but we're going to go through it little by little, right? So, we're going to break it down to smaller pieces so we can understand it. So, let's first look at the two different approaches.

So, the paper talks about one called the standard scaffold. That's what you're used to today. That's basically, if you're using Cloud Code or using Codex, and you're looking at your prompt, you usually will see like in the bottom right-hand corner or somewhere within the terminal or within that interface, how much context you have actually used in a session. And once you get to a certain level, it'll auto-contact for you and basically compress all that information down into a smaller version of itself to give you back some more context, but still understand what you're working on within the system. That's a bit problematic because what actually happens is that you start to lose some information because that compression itself is lossy, right? You're not keeping everything. Now, the breakthrough here, and what the paper proposes, is that there's a new type of algorithm or architecture that we can use called a recursive language model, or an RLM. And basically what it does, it has three main things you want to have, like symbolic handles, symbolic programming, and symbolic recursion. And then we'll talk about that. But the main concept here, if you look at the pseudocode here, the idea here is that if you have an environment, right, that your large numbers model is working in, there's really no need to store your state onto the actual model itself.

2. Enhancing Language Model Performance with RLM

Short description:

To enhance language model performance, use metadata and summaries instead of full prompts. The orchestrator creates copies to parse through text, saving time and costs. Comparing scaffold and RLM approaches shows differences in context handling, recursion, and output sizes.

But you want to pass it in context because the model can't access it, right? And then what you do is you can keep your history, but instead of the full history of the prompt that has to be compacted over time, you can actually pass in just some metadata about it and then what the current state is. And then what it looks like is that you have the OAM, you take your history, it then produces some code. From there, that code is executed, the state is associated with it, and this execution here is like a replication of itself, of another agent or several agents that will basically perform that code. So think of like slicing through the list of texts. Once it gets the text, it'll probably pass a prompt that says, find out some relevant information in this text, is it here? Yes, pass it back to the current orchestrator model. The orchestrator can then take that and all the other contexts for all the other agents that have spawned, and decide to synthesize at the very end and then provide back some final result. So the nice thing is that your prompt lives in the environment and you have this orchestrated agent that basically creates copies of itself and then allows those copies to then parse through the actual document text, and then return back a summary to the larger model. So as a result, you never put the entire prompt or the entire document into the actual model itself. And then that basically saves you a lot of, not as much time, but definitely saves you some context and also some costs. So that's how that algorithm works at a high level.

So let's compare both approaches. So like the scaffold is that the main difference is that you're putting the entire prompt within your context window. So however large, how many tokens that that prompt has, or that document has, that's basically filling the context window with that amount. So you're moving that space off the back. Then what happens as your history grows, because you're adding more information, there's being more context saved, because you're in the same system, the same section, it continues to grow. You're going to see your history grow a lot faster. In the RLM approach, you're only passing back summaries and metadata about what's happening. So that keeps the context window small, just enough information to understand where you're at in the recursion and then what information you have. So the orchestrator can perform various actions like stopping, continuing to spawn elements of itself or knowing that it has enough information to give you back an answer. Then there's the actual recursion itself. So what happens here, is a programmatic in RLM. Meaning you're actually writing code that's going to be executed. Whereas in the standard version, you're basically passing another prompt or instruction like call this tool or answer this question for me. So that's one of the main differences between this approach and the sub-agent approach that you may be familiar with. Then the output size, you're bounded by K, which is the size of the context of the model you're working with. But if you're using this RLM structure in theory, you're technically unbounded. Because you're just taking little chunks of data, you're spawning smaller instances of RLMs that live and die as they pass information back. And the system is built in such a way that you shouldn't be able to exceed the context window in that main orchestrator because you are only ingesting enough information to make an informed decision. So to go deeper into our architecture, we have three primary components. There is the orchestrator, which is just a large-linguist model that generates code.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Node Congress 2022Node Congress 2022
26 min
It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Top Content
The talk discusses the importance of supply chain security in the open source ecosystem, highlighting the risks of relying on open source code without proper code review. It explores the trend of supply chain attacks and the need for a new approach to detect and block malicious dependencies. The talk also introduces Socket, a tool that assesses the security of packages and provides automation and analysis to protect against malware and supply chain attacks. It emphasizes the need to prioritize security in software development and offers insights into potential solutions such as realms and Deno's command line flags.
ESM Loaders: Enhancing Module Loading in Node.js
JSNation 2023JSNation 2023
22 min
ESM Loaders: Enhancing Module Loading in Node.js
Top Content
ESM Loaders enhance module loading in Node.js by resolving URLs and reading files from the disk. Module loaders can override modules and change how they are found. Enhancing the loading phase involves loading directly from HTTP and loading TypeScript code without building it. The loader in the module URL handles URL resolution and uses fetch to fetch the source code. Loaders can be chained together to load from different sources, transform source code, and resolve URLs differently. The future of module loading enhancements is promising and simple to use.
The State of Node.js 2025
JSNation 2025JSNation 2025
30 min
The State of Node.js 2025
Top Content
The speaker covers a wide range of topics related to Node.js, including its resilience, popularity, and significance in the tech ecosystem. They discuss Node.js version support, organization activity, development updates, enhancements, and security updates. Node.js relies heavily on volunteers for governance and contribution. The speaker introduces an application server for Node.js enabling PHP integration. Insights are shared on Node.js downloads, infrastructure challenges, software maintenance, and the importance of update schedules for security.
Towards a Standard Library for JavaScript Runtimes
Node Congress 2022Node Congress 2022
34 min
Towards a Standard Library for JavaScript Runtimes
Top Content
There is a need for a standard library of APIs for JavaScript runtimes, as there are currently multiple ways to perform fundamental tasks like base64 encoding. JavaScript runtimes have historically lacked a standard library, causing friction and difficulty for developers. The idea of a small core has both benefits and drawbacks, with some runtimes abusing it to limit innovation. There is a misalignment between Node and web browsers in terms of functionality and API standards. The proposal is to involve browser developers in conversations about API standardization and to create a common standard library for JavaScript runtimes.
Out of the Box Node.js Diagnostics
Node Congress 2022Node Congress 2022
34 min
Out of the Box Node.js Diagnostics
This talk covers various techniques for getting diagnostics information out of Node.js, including debugging with environment variables, handling warnings and deprecations, tracing uncaught exceptions and process exit, using the v8 inspector and dev tools, and generating diagnostic reports. The speaker also mentions areas for improvement in Node.js diagnostics and provides resources for learning and contributing. Additionally, the responsibilities of the Technical Steering Committee in the TS community are discussed.
Node.js Compatibility in Deno
Node Congress 2022Node Congress 2022
34 min
Node.js Compatibility in Deno
Deno aims to provide Node.js compatibility to make migration smoother and easier. While Deno can run apps and libraries offered for Node.js, not all are supported yet. There are trade-offs to consider, such as incompatible APIs and a less ideal developer experience. Deno is working on improving compatibility and the transition process. Efforts include porting Node.js modules, exploring a superset approach, and transparent package installation from npm.

Workshops on related topic

Building a RAG System in Node.js: Vector Databases, Embeddings & Chunking
Node Congress 2025Node Congress 2025
98 min
Building a RAG System in Node.js: Vector Databases, Embeddings & Chunking
Featured Workshop
Alex Korzhikov
Pavlik Kiselev
2 authors
Large Language Models (LLMs) are powerful, but they often lack real-time knowledge. Retrieval-Augmented Generation (RAG) bridges this gap by fetching relevant information from external sources before generating responses. In this workshop, we’ll explore how to build an efficient RAG pipeline in Node.js using RSS feeds as a data source. We’ll compare different vector databases (FAISS, pgvector, Elasticsearch), embedding methods, and testing strategies. We’ll also cover the crucial role of chunking—splitting and structuring data effectively for better retrieval performance.Prerequisites- Good understanding of JavaScript or TypeScript- Experience with Node.js and API development- Basic knowledge of databases and LLMs is helpful but not required
Agenda📢 Introduction to RAG💻 Demo - Example Application (RAG with RSS Feeds)📕 Vector Databases (FAISS, pgvector, Elasticsearch) & Embeddings🛠️ Chunking Strategies for Better Retrieval🔬 Testing & Evaluating RAG Pipelines (Precision, Recall, Performance)🏊‍♀️ Performance & Optimization Considerations🥟 Summary & Q&A
Build a MCP (Model Context Protocol) in Node.js
JSNation US 2025JSNation US 2025
97 min
Build a MCP (Model Context Protocol) in Node.js
Featured Workshop
Julián Duque
Julián Duque
Model Context Protocol (MCP) introduces a structured approach to LLM context management that addresses limitations in traditional prompting methods. In this workshop, you'll learn about the Model Context Protocol, its architecture, and how to build and use and MCP with Node.jsTable of Contents:What Is the Model Context Protocol?Types of MCPs (Stdio, SSE, HTTP Streaming)Understanding Tools, Resources, and PromptsBuilding an MCP with the Official TypeScript SDK in Node.jsDeploying the MCP to the Cloud (Heroku)Integrating the MCP with Your Favorite AI Tool (Claude Desktop, Cursor, Windsurf, VS Code Copilot)Security Considerations and Best Practices
Node.js Masterclass
Node Congress 2023Node Congress 2023
109 min
Node.js Masterclass
Top Content
Workshop
Matteo Collina
Matteo Collina
Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate
Build and Deploy a Backend With Fastify & Platformatic
JSNation 2023JSNation 2023
104 min
Build and Deploy a Backend With Fastify & Platformatic
Top Content
WorkshopFree
Matteo Collina
Matteo Collina
Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/). 
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.
Building a Hyper Fast Web Server with Deno
JSNation Live 2021JSNation Live 2021
156 min
Building a Hyper Fast Web Server with Deno
Top Content
Workshop
Matt Landers
Will Johnston
2 authors
Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.
0 to Auth in an Hour Using NodeJS SDK
Node Congress 2023Node Congress 2023
63 min
0 to Auth in an Hour Using NodeJS SDK
WorkshopFree
Asaf Shen
Asaf Shen
Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher