English versionEN

Getting Real with NodeJS and Kafka: An Event Stream Tale

In this lightning talk we will review some basic principles of event based systems using NodeJS and Kafka and get insights of real mission critical use cases where the default configurations are not enough. We will cover quick tips and tricks and good practices of Kafka with NodeJS using KafkaJS library and see some real code at a lightning speed.

This talk has been presented at JSNation 2022, check out the latest edition of this JavaScript Conference.

FAQ

The speaker has a background in distributed computing, working with scaling companies, big data, and streaming platforms for the past seven years.

The speaker currently lives in the Netherlands and has been living there for seven years.

The speaker works for Bitvavo, the biggest crypto exchange in the Netherlands.

The speaker writes about Kafka, Kubernetes, and mostly backend topics in their technical blog posts.

A common pattern for integrating multiple databases and data sources is using Kafka to share events between systems.

A 'poison pill' in Kafka is a message that causes the consumer to break, stopping the system. It can be avoided by using a dead letter queue and defining strong types with schemas.

A standard Kafka producer and consumer provide 'at least once' guarantees, which means there might be duplicate messages.

To ensure exactly-once processing in Kafka, you need to configure idempotence to true on the producer side, disable auto-commit offsets on the consumer side, and use transactional boundaries.

Setting a transaction ID in Kafka is important to ensure that consuming, processing, and sending messages happen as part of a single atomic transaction.

For exactly-once semantics, Kafka clusters should have at least three partitions and at least two in-sync replicas for the topics.

node.js backend

Marcos Maia

8 min

16 Jun, 2022

Comments

Video Summary and Transcription

This lightning talk introduces distributed computing and discusses the challenges, patterns, and solutions related to using Kafka for event sharing. It emphasizes the importance of separating services and using strong typing to avoid broken messages. The talk also covers Kafka's transaction configuration and guarantees, highlighting the need for proper configuration and the use of transaction IDs. Overall, it provides valuable insights into scaling companies, big data, and streaming platforms.

Available in Español: Descubriendo NodeJS y Kafka: Un Cuento de Transmisión de Eventos

1. Introduction to Distributed Computing and Bitvavo

Short description:

Hello, everyone. This is a lightning talk where I'll be discussing distributed computing, scaling companies, big data, and streaming platforms. I live in the Netherlands and write technical blog posts. I work for Bitvavo, the biggest crypto exchange in the Netherlands, with a mission to bring the opportunity to trade crypto for everyone.

Hello, everyone. I hope this fits. We had some technical challenge, but let's go. This is a lightning talk. So I'll be very fast.

I spent a lot of time, much more than the talk, thinking about what can I say in this short time that will help you at least to go out from here that feels like, OK, I learned something, or maybe he made me think about something. So my background is distributed computing. So I work with scaling companies usually, and helping systems to really scale. Big data, I worked a lot. Streaming platforms, it's my bread and broth for the past like seven years. And currently, I live in the Netherlands for seven years, from Brazil. And I write a technical blog post. Currently this one, I had another three or four different places where I used to write. But you can find my most recent articles there, usually talking about Kafka, about Kubernetes, mostly back end, in my case. I work for Bitvavo. It's the biggest crypto exchange in the Netherlands. So if you are into crypto or want to be into crypto, it's as quick as clicking a button like you see there. And that's the goal from the company. It's really to bring the opportunity to trade crypto for everyone. And that's what we're doing. That's our mission.

2. Challenges and Solutions with Kafka Event Sharing

Short description:

In this section, I will discuss the challenges, common mistakes, patterns, and solutions related to using Kafka to share events between systems. It is important to separate services in a global platform to avoid reliance on databases. Sending events as JSON can be convenient, but without a contract, broken events can disrupt the system. Kafka's event queue can lead to a system halt when a message cannot be processed, resulting in a poison pill.

So what I'm going to try to talk in this short time, I'm going to talk a bit about this world where we live. Many of us, I'm sure a lot of you, use Kafka to share events between systems. And this is a requirement, of course, because in the current world that we go global with our platforms and your applications, we cannot be reliant on the database. So we really need to separate our services, right?

And I'm going to talk about a few challenges, common mistakes and patterns that we use, and solutions for that. So this is a normal services architecture. You can call it microservices. It really depends where you are, how you do it. It doesn't matter. The important thing here is that you have multiple databases, multiple data sources. You are integrating things through Kafka. And that's a common pattern, more and more. I bet many of you have this.

And a common way to do it, and I've seen this especially on the TypeScript, JavaScript world, is that you send events using JSON. That's very easy because everything is JSON, but the problem is you don't have really a contract, right? If you send events to a JSON, with the producers, the sending side might send something that's actually broken, or other producers might send it, and the consumer starts processing that and breaks up. And the way Kafka works is a queue of events. If you cannot process a message, it doesn't go forward in processing those messages. And then suddenly you are stuck, and your whole system stops because you have what we call a poison pill.

3. Avoiding Broken Messages with Dead Letter Queue

Short description:

To avoid broken messages, apply a pattern called dead letter queue and use strong typing. Guarantee that the message type is correct on the producer side to prevent consumer breakage. Implement a dead letter queue approach to handle broken messages and manually use offset committing.

So how do you avoid that? It's quite actually simple. You apply a pattern called dead letter queue and use other schemas to actually define a schema. So it's a strong type.

So now your messages have to comply on the producer side with a specific typing system. So you guarantee that the type is not going to go wrong for that specific topic, and then your consumers will not break for that case. And you can use a dead letter queue approach that if you start consuming a message and it's broken, instead of getting that cycle forever, you try a couple of times, if it doesn't work, you push that to a different queue, and you go to the next message. And for that you might need to use offset committing manually, and I'm gonna go through that and show some because there is a timer here. It's really scary. I have to go fast.

4. Kafka Transaction Configuration and Guarantees

Short description:

Kafka offers exactly one semantics and transaction boundaries for processing messages in a single transaction. To ensure message integrity, configure Kafka to set idepotence to true on the producer side and disable auto-commit offset on the consumer side. Use a transaction ID to establish boundaries and guarantee atomicity. Keep in mind that Kafka transactions are not distributed transactions like XA transactions. Proper configuration of cluster partitions and in-sync replicas is crucial. Start, send, and commit transactions to ensure message processing or rollback. For more information, refer to the blog post and try the provided docker compose with multiple Kafka nodes for local experimentation.

So think about this standard producer and consumer, if you just use Kafka.js or any client that you decide to use. The normal producer and normal consumers, they don't have strong guarantees. What I mean by that, it doesn't guarantee to you that you're going to send a single message on the producer's side, it might have duplicates. It's what we call at least once. That's the guarantee, which means you can have duplicates. And in some systems, you cannot do that. If you're depositing money, you should not do that. Especially with drawing money from a customer, you should not do that. And on the client's side, you can also have duplicate processing. So the defaults, if you use, that's what you can get. There are solutions for that.

So Kafka offers exactly one semantics. That's the EOS there. And transaction boundaries. And it's very common that you have a pattern like that, that you have a processor, a message that initiates as a process, you want to consume the message, do some processing and produce the message to another topic in a single transaction. So you want to make sure when you consume, do processing, something goes wrong. If you reprocess or restart your system, that you process that message again, because you didn't finalize those three steps, which is sending that message to the next step. And for these, you have to do some configuration in Kafka.

And from the producer's side, you have to set idepotence to true. That guarantees that it's not going to be duplicated and sent. And on the consumer's side, you have to disable the auto-commit offset, which means your client reads the message, but now you have the control. When you want to tell Kafka, I really processed this message, and you commit back the offset. And you want to actually set one property called max in-flight request to one, so you don't have parallel processing and you can keep the ordering guarantees. And on the producer side, on the last step, remember, it's consuming and then producing, and you want all this to be in the same transaction. What you want to set is a transaction ID, so the broker can set the transaction boundaries and say the point is true as well. And what this guarantees is that when you consume, process, and send, it's part of a single atomic transaction. Kafka client, this is not a distributed transaction, like XA transaction. It's not guaranteed if you do a database call on the side that will be rolled back. You have to take care of that. It's on the boundaries of Kafka. It's the same as your normal database transaction between two tables that you are used to, so just keep that in mind.

Some configurations of the cluster, and I'm also done because my clock is blinking already. You should have at least three partitions and at least two in sync replicas for your topics for this to work. So if you try locally and you don't have it, it will not work, and you also have to start the transaction like you can see that, start a transaction, send a message, send the offsets, and then commit the transactions, which means everything is going to happen or it will be rolled back. And that's basically what you want to do, and it's really important to take care of that. That's it. If you want to know more about this, I write in this specific blog post about this, and I also have docker compose where I have multiple Kafka nodes where you can play with this type of more advanced work in your local machine.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder

Node Congress 2022

26 min

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder

Top Content

Feross Aboukhadijeh

Feross is the author and maintainer of WebTorrent, StandardJS, and 100s of other open source projects

The talk discusses the importance of supply chain security in the open source ecosystem, highlighting the risks of relying on open source code without proper code review. It explores the trend of supply chain attacks and the need for a new approach to detect and block malicious dependencies. The talk also introduces Socket, a tool that assesses the security of packages and provides automation and analysis to protect against malware and supply chain attacks. It emphasizes the need to prioritize security in software development and offers insights into potential solutions such as realms and Deno's command line flags.

node.js security

ESM Loaders: Enhancing Module Loading in Node.js

JSNation 2023

22 min

ESM Loaders: Enhancing Module Loading in Node.js

Top Content

Gil Tayar

Microsoft, Israel

ESM Loaders enhance module loading in Node.js by resolving URLs and reading files from the disk. Module loaders can override modules and change how they are found. Enhancing the loading phase involves loading directly from HTTP and loading TypeScript code without building it. The loader in the module URL handles URL resolution and uses fetch to fetch the source code. Loaders can be chained together to load from different sources, transform source code, and resolve URLs differently. The future of module loading enhancements is promising and simple to use.

node.js

Towards a Standard Library for JavaScript Runtimes

Node Congress 2022

34 min

Towards a Standard Library for JavaScript Runtimes

Top Content

James Snell

Workers team @Cloudflare

There is a need for a standard library of APIs for JavaScript runtimes, as there are currently multiple ways to perform fundamental tasks like base64 encoding. JavaScript runtimes have historically lacked a standard library, causing friction and difficulty for developers. The idea of a small core has both benefits and drawbacks, with some runtimes abusing it to limit innovation. There is a misalignment between Node and web browsers in terms of functionality and API standards. The proposal is to involve browser developers in conversations about API standardization and to create a common standard library for JavaScript runtimes.

javascript component library node.js

Out of the Box Node.js Diagnostics

Node Congress 2022

34 min

Out of the Box Node.js Diagnostics

Colin Ihrig

Member of the Node.js Technical Steering Committee

This talk covers various techniques for getting diagnostics information out of Node.js, including debugging with environment variables, handling warnings and deprecations, tracing uncaught exceptions and process exit, using the v8 inspector and dev tools, and generating diagnostic reports. The speaker also mentions areas for improvement in Node.js diagnostics and provides resources for learning and contributing. Additionally, the responsibilities of the Technical Steering Committee in the TS community are discussed.

node.js

The State of Node.js 2025

JSNation 2025

30 min

The State of Node.js 2025

Matteo Collina

Node.js TSC committee member. Pino & Fastify author.

The speaker covers a wide range of topics related to Node.js, including its resilience, popularity, and significance in the tech ecosystem. They discuss Node.js version support, organization activity, development updates, enhancements, and security updates. Node.js relies heavily on volunteers for governance and contribution. The speaker introduces an application server for Node.js enabling PHP integration. Insights are shared on Node.js downloads, infrastructure challenges, software maintenance, and the importance of update schedules for security.

node.js

Node.js Compatibility in Deno

Node Congress 2022

34 min

Node.js Compatibility in Deno

Bartek Iwanczuk

Deno core team member

Deno aims to provide Node.js compatibility to make migration smoother and easier. While Deno can run apps and libraries offered for Node.js, not all are supported yet. There are trade-offs to consider, such as incompatible APIs and a less ideal developer experience. Deno is working on improving compatibility and the transition process. Efforts include porting Node.js modules, exploring a superset approach, and transparent package installation from npm.

node.js deno js runtimes

Workshops on related topic

Node.js Masterclass

Node Congress 2023

109 min

Node.js Masterclass

Top Content

Workshop

Matteo Collina

Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate

node.js

Build and Deploy a Backend With Fastify & Platformatic

JSNation 2023

104 min

Build and Deploy a Backend With Fastify & Platformatic

Top Content

WorkshopFree

Matteo Collina

Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/).
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.

graphql node.js cloud fastify

Building a Hyper Fast Web Server with Deno

JSNation Live 2021

156 min

Building a Hyper Fast Web Server with Deno

Workshop

2 authors

Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.

node.js backend deno

0 to Auth in an Hour Using NodeJS SDK

Node Congress 2023

63 min

0 to Auth in an Hour Using NodeJS SDK

WorkshopFree

Asaf Shen

Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher

javascript node.js authentication

GraphQL - From Zero to Hero in 3 hours

React Summit 2022

164 min

GraphQL - From Zero to Hero in 3 hours

Workshop

Pawel Sawicki

How to build a fullstack GraphQL application (Postgres + NestJs + React) in the shortest time possible.
All beginnings are hard. Even harder than choosing the technology is often developing a suitable architecture. Especially when it comes to GraphQL.
In this workshop, you will get a variety of best practices that you would normally have to work through over a number of projects - all in just three hours.
If you've always wanted to participate in a hackathon to get something up and running in the shortest amount of time - then take an active part in this workshop, and participate in the thought processes of the trainer.

graphql web development node.js beginner friendly

Mastering Node.js Test Runner

TestJS Summit 2023

78 min

Mastering Node.js Test Runner

Workshop

Marco Ippolito

Node.js test runner is modern, fast, and doesn't require additional libraries, but understanding and using it well can be tricky. You will learn how to use Node.js test runner to its full potential. We'll show you how it compares to other tools, how to set it up, and how to run your tests effectively. During the workshop, we'll do exercises to help you get comfortable with filtering, using native assertions, running tests in parallel, using CLI, and more. We'll also talk about working with TypeScript, making custom reports, and code coverage.

typescript node.js testing