Building Durable Workflows From Scratch in JavaScript

This ad is not shown to multipass and full ticket holders
JS Nation
JSNation 2026
June 11 - 15, 2026
Amsterdam & Online
The main JavaScript conference of the year
Learn More
In partnership with Focus Reactive
Upcoming event
JSNation 2026
JSNation 2026
June 11 - 15, 2026. Amsterdam & Online
Learn more
Bookmark
Rate this content

Durable workflows - workflows that checkpoint their state to automatically recover from failure - enable developers to build reliable code faster and dramatically reduce the severity of production incidents. Most workflow systems require you to set up a bunch of infrastructure, but that’s not necessary! In this talk, we’ll show you how to build durable workflows in pure JavaScript, as a library any application can import.

This talk has been presented at JSNation US 2025, check out the latest edition of this JavaScript Conference.

FAQ

Durable workflows in JavaScript are processes that regularly checkpoint the state of a program, allowing it to recover from failures by restoring from the last completed step, similar to save points in video games.

Building reliable systems is challenging because any step in a complex application can fail due to process crashes, resource limits, timeouts, external API failures, or AI brittleness, especially at large scales.

Durable workflows help recover from failures by checkpointing each step of a workflow. If a failure occurs, the system can use these checkpoints to restore the workflow from its last successful step, avoiding the need to restart from the beginning.

Implementing durable workflows in a Node.js application allows you to manage workflows directly within your app without relying on heavy external orchestration systems, reducing latency and complexity.

Checkpointing in durable workflows involves saving the state of each workflow step in a database, allowing the system to recover from failures by loading the workflow's last successful state.

Durable workflows prevent issues like data duplication by ensuring that each step is checkpointed and completed before proceeding, so in the event of a failure, steps are not repeated unnecessarily.

The two main methods in the durable workflow JavaScript library are 'register workflow', which registers a function as a workflow, and 'run step', which executes a function as a step in the workflow.

The library handles workflow recovery by identifying pending workflows that didn't finish, re-running them from the last completed step using checkpointed data, and continuing execution from there.

Using a lightweight durable workflow library in JavaScript allows developers to integrate reliable workflow management directly into their applications without needing complex external infrastructure, making it fast and efficient.

Yes, durable workflows can be used for a variety of applications including CI pipelines, data pipelines, and AI agents, providing reliable execution across different domains.

Peter Kraft
Peter Kraft
18 min
20 Nov, 2025

Comments

Sign in or register to post your comment.
Video Summary and Transcription
Peter discusses building durable workflows in pure JavaScript, highlighting challenges in creating reliable systems, especially in complex applications like Money Transfer. Failures can occur due to various reasons, including process crashes, resource issues, timeouts, and API failures. Durable workflows involve checkpointing for recovery, traditionally involving heavyweight external orchestration systems. Designing a JavaScript workflow library aims for simplicity and durability, enabling checkpointing directly on app servers. Implementing the library includes checkpoints for safety, recovery, and workflow integrity maintenance.

1. Building Durable Workflows in JavaScript

Short description:

Peter discusses building durable workflows in pure JavaScript, highlighting challenges in creating reliable systems, especially in complex applications like Money Transfer. Failures can occur due to various reasons, including process crashes, resource issues, timeouts, and API failures. Restarting failed processes can lead to data corruption or resource wastage, necessitating complex recovery solutions.

Hey, I'm Peter, and today I want to talk to you about how you can build durable workflows from scratch in pure JavaScript. So all of you are developers, and you know how hard it is to build truly reliable systems. It's hard to make your applications reliable because when you're building a complex application, just about any step can break anywhere. If you're building something like Money Transfer, for example, you have to worry about your application breaking, like when you start transferring, while you're in the middle of transferring, at the end, when you're trying to send a confirmation. And things can break for just about any reason.

Applications can break and fail because they're process crashes, because they run out of resources or hit a bug. They can fail because of timeouts, because someone takes too long to respond. They can fail because an external API they use breaks down, because it's rate limited, because it transiently fails, because it has an outage. And, of course, failures are an even bigger worry if you're running at large scale, where there are just more things that can fail. Or if you're using AI, which a lot of folks are these days, because AI is kind of inherently brittle, and the AI providers, the model providers that we rely on, frequently have issues and outages that can break AI applications.

If something fails, the easy thing to do is to retry from the beginning, to restart whatever failed. But often you can't do that. If you're restarting a business process, like a Money Transfer from the beginning, you're risking some kind of corruption or duplication, where you transfer money twice, or double book a reservation. If you're trying to restart a really big task, then you're risking wasting compute resources or being incredibly slow. So often, when instead of just restarting something that fails, you have to implement complicated recovery code that figures out exactly what the failure is, and tries to remediate it directly. And that sort of thing is hard to write.

2. Implementing Lightweight Durable Workflows

Short description:

Durable workflows involve checkpointing to recover programs in case of failure, akin to save points in video games. Useful for various applications, they are complex to integrate into JavaScript apps. Traditional durable workflow architecture involves external orchestration with heavyweight systems, high dependencies, and latency overhead.

So one new tool that can really help with this are durable workflows. So the idea behind durable workflows is that you regularly checkpoint the state of your program. So that if something fails, you can use those checkpoints to recover your program from its last completed step. And you can think of workflows and their checkpoints as working a lot like save points in a video game, where if you're playing a game, you can save regularly, so that if you die, you can reload from the last save. And when you're running a durable workflow, you're checkpointing every step, so that if your program fails, you can reload, resume, recover from the last completed step.

So these sorts of workflows are really useful for all sorts of applications, for important business processes, for CI pipelines, for data pipelines, and nowadays also for agents. So there are plenty of workflows out systems out there, but most of them are kind of big and complicated. They're not easy to integrate into like a JavaScript application. The classic architecture for durable workflows is what I like to call external orchestration. And the idea here is that you have a central orchestrator surface that orchestrates your workflows. And if you want to run a workflow, now you run the workflow and not on your own JavaScript application, but on the service.

And then the way they work is that your server send a request to the workflow service to start a workflow. The workflow service dispatches the first step in the workflow to a worker. The worker executes the task. The service dispatches the second step and so on until every step has been dispatched and the workflow is done. These are big heavyweight systems and they work, but they require you to take on large dependencies, add new services, re-architect your application. They also can have high latency overhead from step dispatch. You might be adding tens or hundreds of milliseconds to the latency of each of your steps.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Scaling Up with Remix and Micro Frontends
Remix Conf Europe 2022Remix Conf Europe 2022
23 min
Scaling Up with Remix and Micro Frontends
Top Content
This talk discusses the usage of Microfrontends in Remix and introduces the Tiny Frontend library. Kazoo, a used car buying platform, follows a domain-driven design approach and encountered issues with granular slicing. Tiny Frontend aims to solve the slicing problem and promotes type safety and compatibility of shared dependencies. The speaker demonstrates how Tiny Frontend works with server-side rendering and how Remix can consume and update components without redeploying the app. The talk also explores the usage of micro frontends and the future support for Webpack Module Federation in Remix.
Understanding React’s Fiber Architecture
React Advanced 2022React Advanced 2022
29 min
Understanding React’s Fiber Architecture
Top Content
This Talk explores React's internal jargon, specifically fiber, which is an internal unit of work for rendering and committing. Fibers facilitate efficient updates to elements and play a crucial role in the reconciliation process. The work loop, complete work, and commit phase are essential steps in the rendering process. Understanding React's internals can help with optimizing code and pull request reviews. React 18 introduces the work loop sync and async functions for concurrent features and prioritization. Fiber brings benefits like async rendering and the ability to discard work-in-progress trees, improving user experience.
Thinking Like an Architect
Node Congress 2025Node Congress 2025
31 min
Thinking Like an Architect
Top Content
In modern software development, architecture is more than just selecting the right tech stack; it involves decision-making, trade-offs, and considering the context of the business and organization. Understanding the problem space and focusing on users' needs are essential. Architectural flexibility is key, adapting the level of granularity and choosing between different approaches. Holistic thinking, long-term vision, and domain understanding are crucial for making better decisions. Effective communication, inclusion, and documentation are core skills for architects. Democratizing communication, prioritizing value, and embracing adaptive architectures are key to success.
Full Stack Components
Remix Conf Europe 2022Remix Conf Europe 2022
37 min
Full Stack Components
Top Content
RemixConf EU discussed full stack components and their benefits, such as marrying the backend and UI in the same file. The talk demonstrated the implementation of a combo box with search functionality using Remix and the Downshift library. It also highlighted the ease of creating resource routes in Remix and the importance of code organization and maintainability in full stack components. The speaker expressed gratitude towards the audience and discussed the future of Remix, including its acquisition by Shopify and the potential for collaboration with Hydrogen.
The Eternal Sunshine of the Zero Build Pipeline
React Finland 2021React Finland 2021
36 min
The Eternal Sunshine of the Zero Build Pipeline
For many years, we have migrated all our devtools to Node.js for the sake of simplicity: a common language (JS/TS), a large ecosystem (NPM), and a powerful engine. In the meantime, we moved a lot of computation tasks to the client-side thanks to PWA and JavaScript Hegemony.
So we made Webapps for years, developing with awesome reactive frameworks and bundling a lot of dependencies. We progressively moved from our simplicity to complex apps toolchains. We've become the new Java-like ecosystem. It sucks.
It's 2021, we've got a lot of new technologies to sustain our Users eXperience. It's time to have a break and rethink our tools rather than going faster and faster in the same direction. It's time to redesign the Developer eXperience. It's time for a bundle-free dev environment. It's time to embrace a new frontend building philosophy, still with our lovely JavaScript.
Introducing Snowpack, Vite, Astro, and other Bare Modules tools concepts!
The Dark Side of Micro-Frontends
React Advanced 2025React Advanced 2025
29 min
The Dark Side of Micro-Frontends
In the Talk, various key points were discussed regarding micro-front-end architecture. These included challenges with micro-intents, common mistakes in system design, the differences between micro-intents and components, granularity in software architecture, optimizing micro-front-end architecture, efficient routing and deployment strategies, edge computing strategies, global state and data sharing optimization, managing data context, governance and fitness functions, architectural testing, adaptive growth, value of micro-frontends, repository selection, repo structures, and web component usage.

Workshops on related topic

AI on Demand: Serverless AI
DevOps.js Conf 2024DevOps.js Conf 2024
163 min
AI on Demand: Serverless AI
Top Content
Featured WorkshopFree
Nathan Disidore
Nathan Disidore
In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.
High-performance Next.js
React Summit 2022React Summit 2022
50 min
High-performance Next.js
Workshop
Michele Riva
Michele Riva
Next.js is a compelling framework that makes many tasks effortless by providing many out-of-the-box solutions. But as soon as our app needs to scale, it is essential to maintain high performance without compromising maintenance and server costs. In this workshop, we will see how to analyze Next.js performances, resources usage, how to scale it, and how to make the right decisions while writing the application architecture.
Model Context Protocol (MCP) Deep Dive: 2-Hour Interactive Workshop
AI Coding SummitAI Coding Summit
86 min
Model Context Protocol (MCP) Deep Dive: 2-Hour Interactive Workshop
Workshop
Stepan Suvorov
Stepan Suvorov
Join a focused 2-hour session covering MCP's purpose, architecture, hands-on server implementation, and future directions. Designed for developers and system architects aiming to integrate contextual data with ML models effectively. Agenda:- Introduction & Why MCP? Key challenges MCP solves and core benefits.- Architecture Deep Dive: components, interactions, scalability principles. - Building Your Own MCP Server: guided walkthrough with code snippets and best practices; live demo or code review.- Future of MCP Developments: potential enhancements, emerging trends, real-world scenarios.
Key Takeaways:- Clear understanding of MCP's rationale.- Insight into design patterns and scaling considerations.- Practical steps to implement a prototype server.- Awareness of upcoming trends and how to apply MCP in projects.