Video Summary and Transcription
This Talk explores the paradigm of message queues for reliable backend execution. It highlights the benefits of message queues, such as guaranteed delivery and offloading of long-running processes. The drawbacks of using queues are discussed, including the complexity of managing infrastructure and applications. The solution of using a reliability layer called Ingest is presented, which allows for non-blocking background tasks and provides a dashboard for monitoring and managing jobs. The Talk also emphasizes the importance of reliability in building software systems and introduces the expanding scope and functionality of Ingest.
1. Introduction to Message Queues
Hello, everyone. Welcome to my talk about reliability, backend, and execution. I will discuss the paradigm that makes life easier. We are now living in a constant 90s nostalgia. The 90s brought us many great things, but there is one thing we could say goodbye to: queues. Message queues are a form of asynchronous service to service communication. They allow for guaranteed delivery and offloading of long-running processes.
Hello, everyone. Welcome to my talk where, for the next 20 minutes, I will talk about reliability, backend, and execution. Just a quick introduction. My name is Sylvia Vargas. I'm from Poland. I really love pierogi and previously I worked at StackBlitz. Now I'm a developer relations lead at Ingest.
This talk is about the paradigm that makes life easier. But before we talk about the good, let's talk about the bad. We are now living in a constant 90s nostalgia. And, of course, this is no surprise. The 90s brought to us a lot of different things, great stuff that really is still with us. However, there is one thing that possibly we could say goodbye to. And these are the queues.
So let's look at what message queues are. A message queue is a form of asynchronous service to service communication using service and microservices architecture. Messages are stored on the queue until they are processed and deleted. Each message is processed only once by a single consumer. But here I need to interject because in actuality, multiple workers can consume messages from a queue. In order to preserve ordering of tasks, they will need to execute serially. But back to the definition now. And message queues can be used to decouple heavyweight processing to buffer or batch work and to smooth spikey workloads. So you can think about it that once you add something to the queue, it will reach its destination one by one. The delivery is guaranteed. And what's happening in the queue does not impact other parts of the infrastructure. And queues can be really massive.
So let's recap. With queues, you get guaranteed delivery because you know that once something is added to the queue, it will leave it only once it's processed. And queues allow developers to offload long-running processes to the background so that your application does not choke. You would use queues for data-intensive processes or when integrating with external systems.
2. Drawbacks of Using Queues
And another benefit of queues is horizontal scalability. However, there are drawbacks to using queues. Building additional infrastructure and managing complex applications can be a lot of work. In times of limited budgets and resources, it's worth considering if managing queues is the right choice. Instead, durable execution allows us to define workflow logic in our application code and ensures reliable execution.
And another benefit is horizontal scalability because multiple messages can be processed in parallel. As workload increases, multi-applications can handle high throughput while remaining reliable.
However, there is a but. So let's look at this Reddit comment. So queues are great in data intensive processes, as I said, that don't need to run on main thread because they execute asynchronously. The tasks are processed in the background and the application is still responsive. However, there are some drawbacks to the queues, which this Reddit user delicately mentions in this quote. Once you take something from the queue, the rest is on you. And queuing service does not care anymore. So what does it even mean? Let's look at that. So queues are great when your application is simple. When it grows in complexity or if it's distributed, you all of a sudden need to worry about a whole wealth of additional infrastructure that you need to build.
And it's going to be you who needs to build it. So, for example, you will need to build concurrency because you want to be able to control how many steps are executed at one time. Or, for example, debouncing because we all know how costly it is when functions execute multiple times. Or state persistence and management because now that you have a distributed or complex application, you have to share state across different functions and queues. Then there's also error handling because what if just hypothetically one service provider has an outage? You will need to include retries and also failures. I mean, retries for failures and also timeouts. And in that case, you also need to recover tooling to understand and process the errors and failed events.
So this already sounds like a lot of work and it's not even an exhaustive list. So you don't have to listen to me on that. In times like this, when engineering budgets and headcounts are slashed down, we as individual developers, engineers need to do more with less. So it is really worth asking at this point, do you really want to be in the business of managing and operating your own queues? Well, Matthew Druker, the CEO of SoundCloud, doesn't think we should. So if this is now a common knowledge, why are people still using queues? Well, we are used to something. It feels familiar and cozy even if it's not the coziest solution. You can make everything work with just enough effort.
Fortunately, there is a better solution that builds on the concept of message queues. So instead of separating our infrastructure, such as queues from our code, what if we could define our workflow logic purely in our application code and ensure it executes reliably? So this is what durable execution gives us. Durable execution is, as the name says, durable. It guarantees that our code will run, it will be completed, even if there are messages failures along the way.
3. Additional Functionalities and Real-World Example
This part explains the additional functionalities provided by the system. It also presents a real-world example of building a signup flow using third-party services, emphasizing the simplicity and ease it offers to developers.
So this part is the same as message queues. However, unlike message queues, you also get retro logic, handling errors, or persisting state, which comes out of the box. You don't have to build it. You are given this.
The other part is flow control, which is everything else that is needed for you to be able to run your functions reliably, such as concurrency, debouncing, task prioritization, handling failures, or recovery tooling.
So enough theory. Let's look at a real world example. So you work at a new hot restaurant booking startup. Your boss asks you how long it will take you to build a signup flow. So you get a list of requirements from the product manager. You look at it with an ice latte in hand. You are completely cool.
So, here's the code. And you would create a user in the database. You would send a welcome email. And you would add the user as a member to a mailing list. Well, this looks very simple. And we like simple. Right? We are lazy as developers. Building applications using third party services is smart and makes your life easier.
4. Challenges of Distributed Applications
Building applications using third-party services is smart and makes your life easier. However, there's a downside. Sometimes services can be slow, causing blocking code and slowing down the API. This affects user experience. Managing retries, logging errors, and implementing a recovery system adds complexity and increases the development timeline.
Job done. You can take a big sip of your ice latte and go back to playing Wordle or reading one of those thousands of mega threads on Twitter.
So, here's the code. And you would create a user in the database. You would send a welcome email. And you would add the user as a member to a mailing list. Well, this looks very simple. And we like simple. Right? We are lazy as developers. Building applications using third-party services is smart and makes your life easier.
So, we like easy. But where is the problem? Is my presentation finished here? Well, no. Well, there's a downside. Because now we have created a distributed application where we have no control over large parts of the infrastructure that we rely on. For example, sometimes services can be slow. Sending an email or even on a good day can take half a second. So, we have a problem now. We have blocking code in the critical path of our request. As a result, we are making our API slower. In other words, your user is wasting time. User experience shouldn't suffer because of the business requirements.
So, how are we doing so far? Is it fast? Well, no. But is it reliable? Also, no. Imagine that as you're adding a user to a mailing list, a service goes down. You need to manage retries. One, two, three, four retries. And what if something fails permanently? You need to log these errors to a logging service. And you also need to figure out a recovery system. So, the estimation to build this very simple feature is weeks now instead of days. You need to set up the infrastructure and processes.
5. Solving Partial Failures with Ingest
Partial failures are bad. Ignoring errors, showing errors to users, or having users retry signing up can lead to lost customers and duplicate entries. This results in a slow and frustrating app. To solve this, we can move non-blocking tasks to background jobs using Ingest. Ingest is a reliability layer that allows you to define and execute functions asynchronously. It provides a dashboard for monitoring and managing jobs.
Also, partial failures are really bad. So, imagine this. You've added the user to the database, but haven't sent them an email. So, now we have three options. First, you can ignore the error, which means that the user is not on the mailing list. Second, even worse, we show the user the error, which will lead to a lost customer. And third, worst case of all, the user will retry signing up. Let's see that.
So, let's assume that the user gets the error and tries to sign up again. But now, the user that creates will error out because there's a duplicate. Well, good luck recovering from that. So, this is the mess we are in. And the app takes forever to work. It takes me forever to build. I will be sore at my work because all of a sudden I have to be dealing with support backlog. We are still not dealing with the persistent failures. And everyone is unhappy. My boss is unhappy. I am stressed. I'm losing my sleep. But there's a solution. We can make our code faster and more reliable.
So, let's move the non-blocking tasks to background jobs. So, first, we'll add Ingest to the project. Ingest is a reliability layer for your app. So, with Ingest, you define functions or workflows using its SDK right in your code base, and then you serve them through an HTTP endpoint in your application. So, Ingest then takes care of reliably executing functions asynchronously. So, there's also a dashboard where you can monitor, debug, and manage your jobs. It's all visual. So, this is how your app looks like now.
6. Using Ingest for Reliable Function Execution
We add a reliability layer called Ingest. Functions are wrapped in Ingest.create to be executed when triggered by an event. The same event name is used for multiple functions to fire simultaneously, known as fan out. Instead of invoking functions directly, we trigger events in Ingest via an HTTP endpoint. Ingest executes the functions and provides notifications on the dashboard. Ingest automatically retries failed functions until they succeed.
We are going to add reliability layer which is Ingest. First, we are going to wrap this function in Ingest.create function. As you see, we are providing an event name. Later when the user signs up, we will trigger an event with this name. You will see that in a second.
This will then tell Ingest to execute the function. So, this is how it would look in code. We are creating the function. We provide the event name, and we are invoking our existing MailChimp code from before. And now we'll do the same for the other function. As you see, we are using the same event name. This is because we want these two functions to fire at the same time. So, when the user signs up, we want these two functions to fire. This pattern is usually called fan out.
So, now when the user clicks the button, we will send an event to Ingest. This is how it looks in the code. Instead of invoking these functions directly, we'll trigger an event in Ingest. Like I mentioned before, we expose functions to Ingest via an HTTP endpoint. Ingest uses this endpoint to execute the specific functions when an event is triggered. This is the endpoint. Ingest will use it to download the function definitions and then to execute them.
And here is the complete flow. So, Ingest calls the correct functions at the precise time you want. And then on the dashboard, you will get a notification that there was an event triggered, which in turn called two functions. So, we see that they completed, and also when. So, all this is looking good. But what happens when there is a failure? Well, let's look at that. Ingest invokes the function and let's say it fails with an error code. So, it will retry it and retry it until it finally succeeds. You don't need to worry about it.
7. Exploring Additional Ingest Functionalities
You can easily debug errors with Ingest console tool and recover by retriggering failed events. Ingest allows scheduling tasks in the future and orchestrating multi-step processes. Increase user retention by sending activation email drip campaigns.
Moreover, you'll get a detailed log of what happened. I know it's difficult to believe this, but sometimes the errors are persistent not because of the service shortages, but because there are bugs in our code. I know this is very difficult to believe. But in those cases, you can actually easily debug it with Ingest console tool and once you've fixed your function, you can recover by retriggering failed events.
So, how are we doing now? We wanted to make our app faster, so we moved the non-blocking tasks from the user's critical path to background jobs. But also, we got reliability as a nice addition. So, now we have access to this great infrastructure. So, let's see what else we can do with it.
This is our app right now. I didn't tell you yet, but Ingest actually allows you to schedule tasks in the future and also orchestrate multi-step processes. So, let's look at the send email function. Here, we are just sending a welcome email, but it is always nice to increase user retention. We could send them activation email drip campaign in the first week. How would we go about that?
8. Building a Drip Campaign with Ingest
We can use Ingest steps to build a drip campaign with sequential progression. Ingest handles scheduling for pausing function execution. The last step is the final email with tips. The campaign can be dynamic based on user actions. Use booking event to determine the course of action.
So far, we have been talking about the fan out pattern, where you have multiple functions firing up on the same event. However, many tasks require sequential progression. Here we are building a drip campaign, so it's convenient that we can express the whole timeline as a procedural code.
So, we will use Ingest steps within this function. We are here, as you can see, we are using Ingest step that run. In this way, the code will get automatically retried if it fails. But the code that runs correctly will never be retried again. We will see it in action in a bit. And first, we are sending a welcome email. This is the part, the same part that we did before. Then Ingest will pause execution of this function for four days. For this, we are using step.sleep. From a programmer perspective, it looks similar to putting set time out, but actually, in the background, Ingest handles the scheduling for you. So, this means that your serverless function does not run for four days. So, you don't have to sell your kidney to pay your AWS bill.
And the last step is the final email with tips. If there is a failure on one of the steps, Ingest will know that the other steps worked and only retry that one step until it works. So, now we have built a successful drip campaign and we can we deserve a round of applause. But we can actually do better. Imagine that someone has already signed up and immediately made a booking. It wouldn't make sense for them to receive the same email as someone who didn't finalize a booking. Maybe they need different types of tips or different CTA. The campaign could be actually dynamic based on the user actions. So, let's do it. Let's delete the last two steps. So, elsewhere in your app, when the user completes a booking, there's an event sent that's called booking.created, just like user.signup. So, now we use this event to determine the course of action. Here, we are waiting for four days to see if this event will even happen. Next, we'll now use a booking event to determine the course of action. If the booking was made, we'll reward this person with power user tips.
9. Expanding the Scope of Ingest
If the booking was made, we'll reward this person with power user tips. There are numerous use cases that go beyond just marketing campaign. You can build complex payment flows, LLM, prompt chaining, or multiple step data transformation. Ingest is also framework and language agnostic.
If the booking was made, we'll reward this person with power user tips. And, well, if they need four days to make a booking, they need some basic tips. And this is honestly so much fun, that's why I stopped there. You could go wild and create a lot of emails with a lot of tips.
And speaking of tips, all of this is just the tip of an iceberg. We are talking here about sending emails. But interest is not a tool for sending emails. There are numerous use cases that go beyond just marketing campaign. You can build complex payment flows, LLM, prompt chaining, or multiple step data transformation. Whenever you need to have a bunch of stuff happening in response to a given event, you could consider Ingest.
Moreover, you can also integrate, you can also migrate from one cloud to the other with zero downtime. Ingest is also a framework agnostic. Here are just a few of them. But we also recently added support for BAN and ASTRA, for example. And finally, it is also language agnostic. In this example, we saw a lot of TypeScript and type safety. But in addition to TypeScript, we also have SDKs for Python and Go. We are also looking to add more.
10. Expanding SDKs and Importance of Reliability
We recently added support for BAN and ASTRA. Our SDKs are available for TypeScript, Python, and Go, with plans to add more. Mix and match SDKs in your workflows and invoke functions written in one language from another. We also have a local dev server for easy testing. Building reliable systems is crucial for user satisfaction and team productivity. Ingest serves as a reliability layer, but sooner or later, you will need a solution. Reliability should be considered early in architectural choices. Thank you.
But we also recently added support for BAN and ASTRA, for example. And finally, it is also language agnostic. In this example, we saw a lot of TypeScript and type safety. But in addition to TypeScript, we also have SDKs for Python and Go. We are also looking to add more. And by the way, our SDK spec is open source and we are inviting contributions. And fun fact, you can also mix and match all those SDKs in your workflows and invoke functions written in one language in another. There is also, finally, a local dev server, which doesn't require you to log in, so you can go and check it out right now.
But, you know, here I spoke a lot about Ingest. But this talk is not only about Ingest. When building real-world production applications, reliability is really important. Not only does it keep your users happy, but it makes your team more productive. And you as a developer are less backed down by maintenance and operations. Achieving reliability is hard. Every engineer who has ever had to build a reliable system at scale knows the amount of iteration and infrastructure that goes into that. In this example, we use Ingest as the reliability layer. But whether or not you use third-party solutions, sooner or later, you will end up needing one. Reliability is like security. It's hard to add afterwards. So, baking it into your architectural choices from the get-go is usually quite a good idea.
And, yeah, thank you all. If you would like to reach out to me or be friends, here's where you can find me. Thank you.
Comments