English versionEN

[EN] Breaking the Context Ceiling: Implementing Recursive Language Models with LangGraph and TypeScript
[ES] Rompiendo el Techo del Contexto: Implementación de Modelos de Lenguaje Recursivos con LangGraph y TypeScript

Breaking the Context Ceiling: Implementing Recursive Language Models with LangGraph and TypeScript

Jamal Sinclair O’Garro is a senior software engineer at Netflix, a published author, runs two of New York City's largest tech meetups

This ad is not shown to multipass and full ticket holders

React Summit US 2026

November 17 - 20, 2026

New York, US & Online

The biggest React conference in the US

Learn More

Bookmark

Sentry

Promoted

Code breaks, fix it faster

Crashes, slowdowns, regressions in prod. Seer by Sentry unifies traces, replays, errors, profiles to find root causes fast.

Get started

MIT's recent ""Recursive Language Models"" paper demonstrated that LLMs can process inputs 100x beyond their context windows — not by expanding the window, but by treating prompts as external environments the model explores programmatically. The results are striking: GPT-5-mini outperformed GPT-5 on long-context tasks while using comparable compute. This talk demonstrates how to build the same architecture in TypeScript using LangGraph and Node.js.

We'll implement an RLM system in which a root agent orchestrates recursive sub-agents, each operating on a focused context slice without suffering ""context rot."" We'll see how to leverage LangGraph's cyclic graph execution to spawn child agents, aggregate their findings into a shared state, and let the orchestrator synthesize results — all while keeping individual context windows small and fresh.

By the end, you'll have a working pattern for processing massive documents, codebases, or datasets that would choke a single LLM call, using tools you can deploy today.

Key takeaways:

- Why bigger context windows don't solve context rot
- Architecting recursive agent graphs in LangGraph
- Managing state and tool execution across agent hierarchies
- Cost and latency tradeoffs in production

This talk has been presented at Node Congress 2026, check out the latest edition of this JavaScript Conference.

FAQ

The speaker is Jamal Legaro, a senior software engineer at Netflix.

The main topic is 'Breaking the Context Ceiling, Recursive Language Models in TypeScript', focusing on processing documents larger than a language model's context window using recursive language models.

The problem is that performance degrades as the context window fills up, leading to a loss-in-the-middle effect where information in the middle of the context is often forgotten or lost.

A recursive language model is an approach that decomposes text into smaller pieces and uses an orchestrator model to process these pieces, improving efficiency and context handling over traditional methods.

The RLM approach keeps the context window small by passing back summaries and metadata, whereas the standard scaffold approach fills the context window with the entire prompt or document, leading to potential performance loss.

Applications include processing large API documentation, debugging large code bases, and analyzing multiple technical specifications or incident reports.

It is considered more efficient because it reduces the context window size and computational cost by processing smaller text chunks and returning summaries, avoiding the need to handle large documents all at once.

Langraph is a library that helps create execution graphs with nodes and edges, facilitating the implementation of recursive language models by managing state and execution flow.

Langsmith is a tool that allows tracing and monitoring of the execution, helping to visualize the flow and track costs and actions of language models.

Considerations include managing costs, implementing timeouts, preventing infinite loops, handling rate limiting, and ensuring fallback mechanisms are in place.

node.js

Jamal Sinclair O'Garro

21 min

26 Mar, 2026

Comments

Video Summary and Transcription

Jamal Legaro discusses challenges with context windows in language models and recursive language models as a solution. Enhancing language model performance through metadata and summaries, orchestrator copies, and comparing scaffold and RLM approaches. Efficient management of language model operations with slicing, recursion, code execution, and context window handling. Graph-based workflow design in Langchain for agent connections and state management. Workflow management, document analysis, and using RLM for processing large documents. Detailed workflow setup, code structure with Langsmith, and node definitions with RLM benefits for cost considerations and production readiness.

Available in Español: Rompiendo el Techo del Contexto: Implementación de Modelos de Lenguaje Recursivos con LangGraph y TypeScript

1. Challenges with Context Windows in Language Models

Short description:

Jamal Legaro, software engineer at Netflix, discusses the challenges with context windows in language models. Large models face performance degradation as tokens increase, leading to a loss-in-the-middle effect. Recursive language models offer a solution by decomposing text into smaller parts, enabling better context retention and decision-making.

Hello Node Congress. My name is Jamal Legaro. I'm a senior software engineer at Netflix, working in the experimentation platform, and welcome to my talk, Breaking the Context Ceiling, Recursive Language Models in TypeScript, or How to Process Documents 100 Times Larger than Your LOM's Context Window Using LangRep. Let's talk more about the problem with context windows. If you use large language models in tools like Claw, Gemini, Claw to Code, you may notice that when you're picking your models, they typically have a limit to their context window, usually somewhere between 128,000 to 1 million tokens. But one thing they don't tell you is that the performance actually is going to degrade as you start to go through those tokens. There's a loss-in-the-middle effect that basically says that as you start to fill up that context window, and you get closer to the limit, your model can only efficiently remember what's at the beginning or at the end of the context. And everything in the middle basically gets lost or forgotten, or it's like more difficult to find. And what that does is causes a degradation in the actual performance of your large language model itself. So, some examples of where this can happen is say if you have API documentation, I'm thinking like multiple docs across multiple applications, and you're trying to piece together some information to figure out how the system works in a larger fashion, or you're trying to find some type of breaking changes in the many release notes, doing some debugging or triaging. And because of this loss-in-the-middle, you'll probably miss the changes you're looking for, because the context is too large. This also happens for very large code bases. Imagine if you had like tons of documents of like incident postmortems that you're trying to go through, and then many technical specs. The idea is that the larger the text, the more this becomes an issue. So, one way to get around this is to use something called a recursive language model. And this comes from some students and researchers at MIT. And then the idea is pretty simple. What they say is, instead of putting the entire document into your prompt to like bloat that context, or pretty much take up a lot of your context, what you can do is you can recursively decompose that text into smaller pieces, and then have an orchestrator model write deterministic code, and have sub, I'll say sub, let's say spawns of itself or copies of itself, perform that text search operation. And then we turn back some metadata and a condensed version of the text that the larger orchestrator can use to actually make a decision. So, before we dig deeper into that, it's probably a mouthful, probably a lot to unpack, but we're going to go through it little by little, right? So, we're going to break it down to smaller pieces so we can understand it. So, let's first look at the two different approaches.

So, the paper talks about one called the standard scaffold. That's what you're used to today. That's basically, if you're using Cloud Code or using Codex, and you're looking at your prompt, you usually will see like in the bottom right-hand corner or somewhere within the terminal or within that interface, how much context you have actually used in a session. And once you get to a certain level, it'll auto-contact for you and basically compress all that information down into a smaller version of itself to give you back some more context, but still understand what you're working on within the system. That's a bit problematic because what actually happens is that you start to lose some information because that compression itself is lossy, right? You're not keeping everything. Now, the breakthrough here, and what the paper proposes, is that there's a new type of algorithm or architecture that we can use called a recursive language model, or an RLM. And basically what it does, it has three main things you want to have, like symbolic handles, symbolic programming, and symbolic recursion. And then we'll talk about that. But the main concept here, if you look at the pseudocode here, the idea here is that if you have an environment, right, that your large numbers model is working in, there's really no need to store your state onto the actual model itself.

2. Enhancing Language Model Performance with RLM

Short description:

To enhance language model performance, use metadata and summaries instead of full prompts. The orchestrator creates copies to parse through text, saving time and costs. Comparing scaffold and RLM approaches shows differences in context handling, recursion, and output sizes.

But you want to pass it in context because the model can't access it, right? And then what you do is you can keep your history, but instead of the full history of the prompt that has to be compacted over time, you can actually pass in just some metadata about it and then what the current state is. And then what it looks like is that you have the OAM, you take your history, it then produces some code. From there, that code is executed, the state is associated with it, and this execution here is like a replication of itself, of another agent or several agents that will basically perform that code. So think of like slicing through the list of texts. Once it gets the text, it'll probably pass a prompt that says, find out some relevant information in this text, is it here? Yes, pass it back to the current orchestrator model. The orchestrator can then take that and all the other contexts for all the other agents that have spawned, and decide to synthesize at the very end and then provide back some final result. So the nice thing is that your prompt lives in the environment and you have this orchestrated agent that basically creates copies of itself and then allows those copies to then parse through the actual document text, and then return back a summary to the larger model. So as a result, you never put the entire prompt or the entire document into the actual model itself. And then that basically saves you a lot of, not as much time, but definitely saves you some context and also some costs. So that's how that algorithm works at a high level.

So let's compare both approaches. So like the scaffold is that the main difference is that you're putting the entire prompt within your context window. So however large, how many tokens that that prompt has, or that document has, that's basically filling the context window with that amount. So you're moving that space off the back. Then what happens as your history grows, because you're adding more information, there's being more context saved, because you're in the same system, the same section, it continues to grow. You're going to see your history grow a lot faster. In the RLM approach, you're only passing back summaries and metadata about what's happening. So that keeps the context window small, just enough information to understand where you're at in the recursion and then what information you have. So the orchestrator can perform various actions like stopping, continuing to spawn elements of itself or knowing that it has enough information to give you back an answer. Then there's the actual recursion itself. So what happens here, is a programmatic in RLM. Meaning you're actually writing code that's going to be executed. Whereas in the standard version, you're basically passing another prompt or instruction like call this tool or answer this question for me. So that's one of the main differences between this approach and the sub-agent approach that you may be familiar with. Then the output size, you're bounded by K, which is the size of the context of the model you're working with. But if you're using this RLM structure in theory, you're technically unbounded. Because you're just taking little chunks of data, you're spawning smaller instances of RLMs that live and die as they pass information back. And the system is built in such a way that you shouldn't be able to exceed the context window in that main orchestrator because you are only ingesting enough information to make an informed decision. So to go deeper into our architecture, we have three primary components. There is the orchestrator, which is just a large-linguist model that generates code.

3. Efficient Management of Language Model Operations

Short description:

To efficiently handle prompts and metadata, the system determines slicing and recursion strategies. The code executor initiates code execution, updates state, and analyzes data for decision-making. Placing large documents outside the context window maintains context efficiency and allows programmatic recursion for decision-making and code refinement.

It receives our prompts and our metadata. And it decides what text code is going to write to explore our prompt that's in our context windows. I think it's some document that we're loading. And then it decides what the slicing and recursion strategy is going to be. What part of the doc should you slice? How many partitions should we have? And then how many recursive calls to create a new large-linguist model or RLM instance should it create?

Then the code executor is going to kick off a node session. And then within that, it's going to run that code that was being passed to it. It's going to update its state. And it's going to call this analyze callback, which is going to analyze the data and then return the results back to the orchestrator to decide if it should go through another execution loop to spawn off more, we'll call them sub LLMs, or should it go through and synthesize the result to produce a final answer?

So why does this work? It's basically like you lose zero context from putting that large document or prompt into the context window. So the main large-linguist model's context window stays very, very small because it's always parsing chunks of it and getting summaries back to it because the prompt lives outside an external environment. And it knows how to pass a reference to that environment to the actual large-linguist models that it's going to spawn. And then you have this programmatic recursion, which basically just runs in a loop. It calls a special analyze function that we have that decides if it should spawn off some new instances or synthesize the data that it has. And then the nice thing about this is that the large-linguist model decides what to do is not we don't decide on it programmatically, which is the very interesting part. And then what happens is that on each turn, so each cycle, the LLM is going to inspect the results and decide what to do. It'll write more code and refine that code to farther parse down that text to get the information that it needs. And once it has everything, then it moves on. So that's the concept, but how do we actually build this, right?

4. Graph-Based Workflow Design in Langchain

Short description:

Creating a graph structure of nodes and edges facilitates agent connections and state management. Langchain integrates well with TypeScript and major models, offering an execution runtime for complex agentic workflows like those discussed. Core components include nodes for work functions, state handling within Langraph, and edges for graph flow control.

It's really nice because it lets you create a graph of executions full of nodes and edges, where nodes are your agents and your edges are how agents are basically connected to each other. You have state management built in for free. You can basically use reducers to handle accumulation of your results. Think of your metadata and the results that you have for each sub-level RLM's information that it passes back. You also have conditional routing. You can decide conditionally what node you should go to or what action you should take. Should you spawn another instance and execute more code, or should you return the context you have back up the pipeline? Should you call some utility function and do more partitions, or should you synthesize the results? That's how you should think about that. Langchain also works very nicely with TypeScript. They have an official SDK. So it's good for us as TypeScript node developers. And then you can basically use it with like Anthropic, OpenAI, pretty much any of the main models and also a pretty easy tool called Langsmith, which we'll look at. So it's not just a wrapper around large language models. It basically gives you execution runtime to help make complex agentic workflows like the one we're proposing here in this talk.

We go deeper into some of the core concepts. The four primary components we're going to focus on are nodes. And you can think of these as functions that just perform some work. They take in the state, which is stored within Langraph, and then it'll do some work and then return some type of result. So if you think of React, of like how React works, you have a state, then there's some operations done to it, then a new version of the state is returned. You can think of it like that. Edges define how we actually connect nodes to each other in the graph. So like the start is like a special start node and as an end node, we can connect it by using these add edge methods. But the idea is that we use these edges to construct our graph and control most of our flow.

So the state is basically what it is. It's like an object that just contains some values and then you can have some functions, like producer functions, for example, to help compute new values. And then you have conditional edges that will basically allow you to conditionally decide which node to return based on some type of logic or condition, hence the name conditional edges. So when we put those all together, we can take the constants from the RLM and map it to Langraph. So that symbolic handle, right? So that is basically just a fancy name for saying like an actual pointer to your prompt or your large document can basically be stored on the state within Langraph. The symbolic programming is basically handled by our code executor node that just knows how to write text with onFly in a node vs vm context. And your symbolic recursion is basically just like, you know, calling back to the main VM context to give it back some more data and then deciding which node it should go through using that conditional edge. And then same thing with the iteration, you can basically, you know, cycle back between nodes and nodes.

5. Workflow Management and Document Analysis

Short description:

The process involves cycling between nodes, ensuring safety within the sandbox environment to prevent issues like infinite loops. Running scripts to analyze large documents can lead to token limit problems, prompting the use of RLM for effective processing and tracing with Langsmith.

And then same thing with the iteration, you can basically, you know, cycle back between nodes and nodes. So for example, this orchestrator goes to a code executor loop or it goes to synthesizer once it's done. And then like there's some safety in there, being that when we use this vmprec context, you know, it's pretty much sandbox. We don't allow it to test the file system, network or spawn any other separate processing. So again, here is that a graph-to-jargon memory, we have the orchestrator. The orchestrator basically will call the code executor. The code executor is going to basically decide if it needs to, you know, get back into the main orchestrator. If orchestrator needs more information, it will spawn more code executors and it'll basically go if we run on max term. So here's one thing to mention is that we want to set a limit on how many times we'll go through the cycle, but we couldn't run into an infinite loop. So either it's going to have enough information to basically move on to synthesize results and give an answer or if we're detecting an infinite loop, we're going to break out of it and then synthesize what we have and then break out of it.

Okay. So here I'm looking at my terminal and then here we're going to have two examples and we'll show the code in this and briefly what I'm going to do right now is I'm going to run npm run naive. This is basically a simple script that's going to load a very large document or a template with a very large document inside of it. So let's take that doc for a second. So this is this. So imagine it's like some big library of all of your tech documentation and you're trying to figure out, you know, what are some things that are deprecated that you're trying to, you know, need to do for migration, but for whatever reason the document is so large, right, that you could probably barely, I can't even barely load the preview because there's so much text. And let's say I want to say hey, you know, claud or codex, in this case it's going to be signed, um, 4.5, we're going to say load this document in your context, I'm going to ask some questions about it to help me put some migration. So cool, let's run it and then, oh no, what happens? We run into an issue. The prompt is too long, right, because our number of tokens exceeds the maximum. So we can't do anything here, we're dead in the water, like what can we do at this point in time? So what we can do is we can basically run another version that uses our RLM that's going to basically spawn so many sub-agents. What I'm going to do now is just run it and just like look at the logs. We'll also look at another tool called Langsmith that allows us to like see the actual trace. You see right now it actually loads our configuration, right, we give it the query, we have Langsmith basically like tracing and tracking every action our large language model is doing, it keeps track of the cost and other information that may be useful. So right now we have the orchestrator generating the code, we have our sandbox, we see here that we got eight sections that we partitioned for analysis, we're now analyzing looking for breaking changes, and then what we're doing here is we have these calls to this sub-RLM method, which basically is going to take that code, take our existing context, and then just like pass in a prompt to it. We're going to say what are the breaking changes, give us back some information, and then now we see at the end of the day, we're able to parse through that large text and get back some actual results, right. So we actually were able to parse through this and didn't take that, didn't take that long. If we look over here and we go into Langsmith, we'll be able to see, and I'm just going to close this out, we'll be able to see like what happened. We can trace through. The orchestrator kicked off, we see how many tokens that he used, we can tell what the model was, we can tell how long we executed for, and we can tell what node it went to, so now it's our code executor, and we see that we spawned about eight, I want to say eight of these smaller RLMs that have this own context, right, and it has its own prompt that we're passing into it. So what we see here, the main orchestrator asks it, what are the breaking changes in this text? It gives this text to analyze, right, and then it goes through, parses it, that's its input, it gives back some output, and you can see the metadata associated with the call.

6. Detailed Workflow Setup and Code Structure

Short description:

Langsmith provides detailed traceability and model attachment insights. The code structure includes orchestrator, code executor, and synthesizer nodes with defined functions. The graph setup involves adding nodes, edges, and conditional edges to guide the workflow effectively.

So that's the nice thing about Langsmith, you get like flow traceability into this, even down to the synthesizer node, and you know exactly what model is attached to which item, and then what output, and you see here, this is the actual same text that we saw on Terminal, and then here you see these are the inputs. So you can see like how that recursion kind of works, whereas before we tried to use one model, it would basically break immediately, and then if we tried to use the RLM, it's actually able to parse it by breaking that document down into smaller chunks by just like having the main orchestrator write arbitrary code.

Let's very briefly just look through the code, and then we'll finish with some more important parts in these slides. You can see here that we have like some functions, this router app orchestrator, and this router app execution are basically functions that are passed to our conditional edges so we know like which node wants to return. We're basically saying if we're out of terms or maximum cycles, we just go back to the size of results, otherwise we'll go to the code executor to execute the code to parse through our data, and the route app execution is pretty similar, but what it's doing is that it's basically saying that if we have a final result, then we're going to basically synthesize it, or if we have the orchestrator, or if we don't have a result, they're going to go back to orchestrator.

Our graph definition is here, so we basically just pass in a state graph, pass in our state, we add our nodes, so orchestrator, code executor, and a synthesizer, and then we add our edges, we add our conditional edges, and we see here these are two functions that we looked at to determine which edges to go to, and if we continue we see that the final edge is our synthesizer. Once we get there, we end it and compiling compiles our graph. If we jump into these definitions, these are basically, we see here the actual nodes we're defining right, so this is our orchestrator node that contains all the logic for what the orchestrator does, we have metadata info, you have the actual prompt itself that's being passed to it so it knows how to behave and knows what to pass into these sub agents.

7. Node Definitions and RLM Benefits

Short description:

Exploring node definitions including orchestrator, code executor, and synthesizer functions. Recap on the RLM benefits, cost considerations, and production readiness with timeouts and error handling.

If we jump into these definitions, these are basically, we see here the actual nodes we're defining right, so this is our orchestrator node that contains all the logic for what the orchestrator does, we have metadata info, you have the actual prompt itself that's being passed to it so it knows how to behave and knows what to pass into these sub agents. We have our code executor node that knows how to take that information and then basically either cut through and parse through the data using several helper utility functions. So here's the main code executor right here, you see the logic and has some functions associated with it that it can do to create the sandbox, get a result, it can do some previews and other actions, some error handling here also as well, and our synthesizer node is going to basically take in our final state and then it's going to, based off of the prompt that's given to it, you know synthesize an answer and give it back to the user.

Just to recap, we start with the direct approach when we tried to load the context window, it was too large and it failed right away. As soon as we loaded the data in the system, we got an error from the Antropic API saying that it's too many tokens. So we can see right here where the RLM becomes useful, and we ran RLM, we actually got back some results that was helpful, it was able to identify exactly what the issues were, so like number four and after, single large numbers model call, you know we basically blew up as soon as we sent the information over, but then if we use the RLM we were actually able to find out what are some of the critical breaking changes that ran on the system. Some things to think about when you put this into production is like the cost. You're going to get a cost benefit with this.

One thing we noticed that as we actually looked at Landgraft, there's a cost in there. You see that as you go from 50,000 characters to 200,000 to 900,000, right, like the cost goes up incrementally, it's not large, it's staying relatively flat, which is one benefit of this approach, is that it's going to take a little bit longer, but it's going to be a lot cheaper, especially if you're parsing large amounts of data. Some things you want to think about if you're putting into production is like should we do a timeout. Our implementation is naive, we're not timing out, there's no like retries or finish with backoff that may be useful to have like any other system that you want to have. You might want to put these into here before putting into production.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder

Node Congress 2022

26 min

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder

Top Content

Feross Aboukhadijeh

Feross is the author and maintainer of WebTorrent, StandardJS, and 100s of other open source projects

The talk discusses the importance of supply chain security in the open source ecosystem, highlighting the risks of relying on open source code without proper code review. It explores the trend of supply chain attacks and the need for a new approach to detect and block malicious dependencies. The talk also introduces Socket, a tool that assesses the security of packages and provides automation and analysis to protect against malware and supply chain attacks. It emphasizes the need to prioritize security in software development and offers insights into potential solutions such as realms and Deno's command line flags.

node.js security

ESM Loaders: Enhancing Module Loading in Node.js

JSNation 2023

22 min

ESM Loaders: Enhancing Module Loading in Node.js

Top Content

Gil Tayar

Microsoft, Israel

ESM Loaders enhance module loading in Node.js by resolving URLs and reading files from the disk. Module loaders can override modules and change how they are found. Enhancing the loading phase involves loading directly from HTTP and loading TypeScript code without building it. The loader in the module URL handles URL resolution and uses fetch to fetch the source code. Loaders can be chained together to load from different sources, transform source code, and resolve URLs differently. The future of module loading enhancements is promising and simple to use.

node.js

The State of Node.js 2025

JSNation 2025

30 min

The State of Node.js 2025

Top Content

Matteo Collina

Node.js TSC committee member. Pino & Fastify author.

The speaker covers a wide range of topics related to Node.js, including its resilience, popularity, and significance in the tech ecosystem. They discuss Node.js version support, organization activity, development updates, enhancements, and security updates. Node.js relies heavily on volunteers for governance and contribution. The speaker introduces an application server for Node.js enabling PHP integration. Insights are shared on Node.js downloads, infrastructure challenges, software maintenance, and the importance of update schedules for security.

node.js

Towards a Standard Library for JavaScript Runtimes

Node Congress 2022

34 min

Towards a Standard Library for JavaScript Runtimes

Top Content

James Snell

Workers team @Cloudflare

There is a need for a standard library of APIs for JavaScript runtimes, as there are currently multiple ways to perform fundamental tasks like base64 encoding. JavaScript runtimes have historically lacked a standard library, causing friction and difficulty for developers. The idea of a small core has both benefits and drawbacks, with some runtimes abusing it to limit innovation. There is a misalignment between Node and web browsers in terms of functionality and API standards. The proposal is to involve browser developers in conversations about API standardization and to create a common standard library for JavaScript runtimes.

javascript component library node.js

Out of the Box Node.js Diagnostics

Node Congress 2022

34 min

Out of the Box Node.js Diagnostics

Colin Ihrig

Member of the Node.js Technical Steering Committee

This talk covers various techniques for getting diagnostics information out of Node.js, including debugging with environment variables, handling warnings and deprecations, tracing uncaught exceptions and process exit, using the v8 inspector and dev tools, and generating diagnostic reports. The speaker also mentions areas for improvement in Node.js diagnostics and provides resources for learning and contributing. Additionally, the responsibilities of the Technical Steering Committee in the TS community are discussed.

node.js

Node.js Compatibility in Deno

Node Congress 2022

34 min

Node.js Compatibility in Deno

Bartek Iwanczuk

Deno core team member

Deno aims to provide Node.js compatibility to make migration smoother and easier. While Deno can run apps and libraries offered for Node.js, not all are supported yet. There are trade-offs to consider, such as incompatible APIs and a less ideal developer experience. Deno is working on improving compatibility and the transition process. Efforts include porting Node.js modules, exploring a superset approach, and transparent package installation from npm.

node.js deno js runtimes

Workshops on related topic

Building a RAG System in Node.js: Vector Databases, Embeddings & Chunking

Node Congress 2025

98 min

Building a RAG System in Node.js: Vector Databases, Embeddings & Chunking

Featured Workshop

2 authors

Large Language Models (LLMs) are powerful, but they often lack real-time knowledge. Retrieval-Augmented Generation (RAG) bridges this gap by fetching relevant information from external sources before generating responses. In this workshop, we’ll explore how to build an efficient RAG pipeline in Node.js using RSS feeds as a data source. We’ll compare different vector databases (FAISS, pgvector, Elasticsearch), embedding methods, and testing strategies. We’ll also cover the crucial role of chunking—splitting and structuring data effectively for better retrieval performance.Prerequisites- Good understanding of JavaScript or TypeScript- Experience with Node.js and API development- Basic knowledge of databases and LLMs is helpful but not required
Agenda📢 Introduction to RAG💻 Demo - Example Application (RAG with RSS Feeds)📕 Vector Databases (FAISS, pgvector, Elasticsearch) & Embeddings🛠️ Chunking Strategies for Better Retrieval🔬 Testing & Evaluating RAG Pipelines (Precision, Recall, Performance)🏊‍♀️ Performance & Optimization Considerations🥟 Summary & Q&A

node.js database

Build a MCP (Model Context Protocol) in Node.js

JSNation US 2025

97 min

Build a MCP (Model Context Protocol) in Node.js

Featured Workshop

Julián Duque

Model Context Protocol (MCP) introduces a structured approach to LLM context management that addresses limitations in traditional prompting methods. In this workshop, you'll learn about the Model Context Protocol, its architecture, and how to build and use and MCP with Node.jsTable of Contents:What Is the Model Context Protocol?Types of MCPs (Stdio, SSE, HTTP Streaming)Understanding Tools, Resources, and PromptsBuilding an MCP with the Official TypeScript SDK in Node.jsDeploying the MCP to the Cloud (Heroku)Integrating the MCP with Your Favorite AI Tool (Claude Desktop, Cursor, Windsurf, VS Code Copilot)Security Considerations and Best Practices

node.js

Node.js Masterclass

Node Congress 2023

109 min

Node.js Masterclass

Top Content

Workshop

Matteo Collina

Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate

node.js

Build and Deploy a Backend With Fastify & Platformatic

JSNation 2023

104 min

Build and Deploy a Backend With Fastify & Platformatic

Top Content

WorkshopFree

Matteo Collina

Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/).
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.

node.js cloud graphql fastify

Building a Hyper Fast Web Server with Deno

JSNation Live 2021

156 min

Building a Hyper Fast Web Server with Deno

Top Content

Workshop

2 authors

Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.

node.js deno backend

0 to Auth in an Hour Using NodeJS SDK

Node Congress 2023

63 min

0 to Auth in an Hour Using NodeJS SDK

WorkshopFree

Asaf Shen

Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher

javascript node.js authentication