Video Summary and Transcription
Today's Talk discusses different approaches for implementing real-time updates in server-side applications, including application-level updates and polling. The drawbacks of polling include inefficiency and complexity at scale. Adding extra infrastructure, like messaging systems, can ensure scalability but introduces operational overhead. Prisma Pulse is a system that simplifies change data capture, providing an easy setup for subscribing to database changes and solving scalability issues.
1. Real-time Application Architectures
Hello everyone! Today, I'll discuss different approaches for implementing real-time updates on the server side of a real-time application. The first approach is application-level updates, where the application server handles everything. However, this approach doesn't scale horizontally and suffers from the dual write problem. Another approach is polling, where the API server periodically asks for updates from the database. This approach is resource-intensive due to the high number of queries. Let's explore these approaches in detail.
Hello, and welcome everybody to my talk today about how not to build real-time applications. My name is Nikolas Berg. I work as a developer advocate at Prisma, where we're all about developer experience for developers that are working with databases. I'll take you on a journey today where, first, we're going to set the stage about architecting a real-time application. And then I want to talk about three different approaches for how you can implement real-time updates on the server side and their trade-offs. And then we'll walk away with a couple of conclusions.
So let's jump right in and assume that you're in a job interview and this is your interviewer. And this is the question that he asks you. How would you architect a real-time chat application? Well, if you're like me, you'll probably start talking about this three-tier architecture diagram and that on the front end, you'll use WebSockets to create permanent connections between your API clients, between the browser and the IPI servers. But my talk today really is about the second part, about the connection between API server and database and how to implement real-time updates there. So how does the API server even learn about anything that's changing in the database? That's the big question for today. And I want to talk about three different approaches. The first one I call application-level updates, then I want to talk about polling and then adding extra infrastructure. So with application-level updates, you really let the application server handle everything. And let's quickly understand how it works with a simple scenario here.
So let's assume we have this chat application, we have three users that are connected to the API server, Alice, Bob and Jane, and they have these WebSocket connections to the API server. So now, first, let's assume that Alice is sending a message to the API server, the API server stores that message in the database, and next, the API server is responsible for broadcasting that message to Bob and Jane. So what could go wrong in that scenario? Once we start seeing a little bit more traffic and we'll want to scale our application and our API servers horizontally, we'll have the problem that Alice and Bob could be connected to the first API server and Jane would be connected to the second API server instance. Because these WebSocket connections are permanent, Jane now will not receive the update from the API server when Alice sends a message. So this approach doesn't scale horizontally. We also have the problem of the so-called dual write problem because the API server needs to do two things. It needs to store the data in the database and it needs to broadcast the message to all the connected clients. What if one of these operations fails? So this is a pretty tricky situation that I'll come back to in a little bit. Now let's review quickly the pros and cons of this application level updates approach. So the pros are that it's fairly easy to understand, you don't need any extra infrastructure, but the problem is that it's not possible to scale this horizontally and you also suffer from the dual write problem here. So let's take a look at another approach and that's polling. With this approach, we just let the API server periodically ask for updates from the database by sending the same database query to the database over and over again. What could go wrong with this approach? So the problem here is that it's pretty resource intensive. Assume we have N users and per user, we have M polling queries. So this becomes very resource intensive with N times M queries for every polling interval.
2. Real-time Update Approaches: Polling
Polling is an inefficient approach to real-time updates as it wastes resources and becomes complex to manage at scale. Engineers should strive for elegant solutions that address the challenges posed by the business domain.
If you're polling every couple of milliseconds, that's very bad because then you are wasting a lot of resources, a lot of database connections on the database side, but also on the API server side. So it's very expensive and it's not really a good approach to this problem.
Let's review the pros and cons. It's still fairly easy to understand. So if you don't have that much traffic, you don't need any extra infrastructure, it's fairly easy to implement and you don't have the dual write problem, which actually is a pretty good benefit. However, the cons are that it's pretty resource intensive once you're scaling up to multiple users and the application logic for diffing also gets complex really fast because every time when the results of a database query arrive, you need to compare that with the current state of what has been stored in the database before. And that also gets really complicated. And quite honestly, I think fundamentally polling isn't the right tool for the job when we're talking about real time updates. I think as engineers, we should have the ambition to find elegant solutions to the problems that the business domain that we're operating in poses to us. And I don't really think that polling qualifies here.
3. Real-time Update Approaches: Extra Infrastructure
Adding extra infrastructure, such as messaging systems like Kafka or RabbitMQ, can ensure scalability. However, the operational overhead and dual write problem make it challenging. Change data capture provides a solution by allowing the database to propagate updates to the messaging system, ensuring unidirectional data flow. However, CDC implementation is complex and not easy to maintain.
And I don't really think that polling qualifies here. So another approach is to add extra infrastructure.
You can ensure scalability using messaging systems like Kafka or RabbitMQ. Again, let's understand how this works with a simple scenario of Alice, Bob and Jane again. This time we have a Kafka queue connected to our API server and the first step we subscribe to a Kafka topic. We then receive the hello message from Alice. The API server writes the message into the database and then publishes the event to Kafka. From there, it receives the event again and can then broadcast it to Bob and Jane.
So with this approach, we solve the problem of horizontal scalability that we had with the application level updates. But the operational overhead of setting up and maintaining additional infrastructure is actually quite notable. And if you've worked with Kafka before, then I think you know that it's not the easiest system to work with. So a second problem here still is the dual write problem. Let's actually take a bit of a deeper look at what this actually is. Here we see the code for what, so how it could be implemented with Kafka when we received this message from one of our users.
So it's effectively relating to the first step that we saw in the sequence before. The API server first receives the message and writes it into the database, and then it sends it to the Kafka topic that the chat messages are going into. But what if one of these operations fails? You're now in an inconsistent state because either the database has stored data that hasn't arrived at your users, or the other way around, your users are seeing data that's not stored in the database. Either way, it's a very undesirable situation and very tricky to resolve. Of course, you could start implementing your own retry logic or try to wrap everything in a transaction, but all of this gets very complicated and also resource intensive again.
So how can we actually solve this problem on a deeper level? And the answer to that is change data capture. Change data capture is a design pattern that's based on the idea of unidirectional data flow. And with that, it's actually not the API server that broadcasts the updates to our messaging system, but rather, these updates are propagated directly by the database. Let's walk again through the same scenario, but this time using unidirectional data flow and change data capture. So Alice again starts by sending the message to the API server and the API server then stores that in the database. Now the database is responsible for publishing this update to the messaging system. And from there, the API server receives the update from the messaging system that we're using and can broadcast the message to all the subscribed clients. So now we very elegantly solve this problem of the dual rights and the potential data inconsistencies by changing the way how data is flowing through our system. So what's the catch with CDC? If I should summarize it, it's really this. It gets complicated. If you want to maintain and build your own CDC system, this is not an easy task.
4. Real-time Update Approaches: Prisma Pulse
Prisma Pulse is a system that implements CDC, providing type-safe database changes into your application. It offers an easy setup for subscribing to database changes and solves scalability issues. The solution to the dual write problem is change data capture, but implementing and maintaining it can be challenging. Prisma Pulse simplifies the usage of change data capture in your applications.
And if you're a startup or in general, a development team that's really focused on delivering value to your users, I don't think this is where you want to spend your time. So what I want to recommend to you today is give Prisma Pulse a shot. Prisma Pulse is a system that implements CDC, the idea of unidirectional data flow based on Cloudflare workers, by reading the write-ahead log from your Postgres database. It delivers type-safe database changes into your application, and it's very easy to understand and set up.
Here is what it looks like to subscribe to database changes using Prisma Pulse. You only have this one line where on the message table, you call the subscribe method, and then you start this async iterator and start waiting for events that are happening on the database table on the message table.
So what to take away from this talk? Deploying real-time applications comes with a lot of complexity, and the naive approaches like application level updates and polling just don't cut it. They don't scale. You must introduce additional infrastructure like Apache Kafka to ensure scalability, but the dual write problem, even then, is nasty and difficult to overcome. The elegant solution to this problem is called change data capture, but this is also hard to implement, operate, and maintain. So Prisma Pulse gives you the easiest way to use change data capture in your applications. And that's all I have for you today. Thank you so much.
Comments