Video Summary and Transcription
This is a background jobs one-on-one talk focusing on the challenges and benefits of using background jobs in software development. It explores the complexity of software development and the impact of distributed applications. The talk highlights the use of Ingest as a reliable solution for executing functions in the background and building drip campaigns. It emphasizes the importance of reliability and architectural choices in software development and discusses the features and capabilities of Ingest, including local development, handling failures, and data retrieval.
1. Introduction to Background Jobs
This is a background jobs one-on-one talk. I'm Sylvia Vargas, and I love pirogi and pigeons. I've contributed to React docs and previously worked at StackBlitz. Now I'm with Ingest as the developer relations lead. I enjoy working on side projects because it's all joy and fun.
So basically, this is a background jobs one-on-one talk, a very gentle introduction. If you don't know anything about background jobs, that's fine. But if you do know a lot, then you can support me by laughing at my jokes.
This was a test. So I'm Sylvia Vargas and I was wondering what information could be relevant to this very fine audience. And so I came up with a little game. So the game goes like this. I'll tell a fact about me, and if we have that in common, make a noise, whatever noise you want. So a warm-up. I'm very happy to be here. Okay. I love pirogi. It's going to get more and more obscure, so brace yourself. I love pigeons, and I have a pigeon family living on my balcony. Okay. Okay.
And, well, some of you may know me because I did some contributions to this page called React docs. This is the page. And previously... Anyone? No? Okay. And previously, I was running developer relations at Ingest, sorry, at StackBlitz, and now I recently joined Ingest as the developer relations lead. As I told you, this got more and more obscure. But here is one that maybe we still have in common. So the fun fact about me is that I really love working on side projects. Because it's... No? No? Because it's all joy and fun. And basically you don't have to worry about all the business needs. You can just focus on the user experience. And because oftentimes you are your own and only user, you will always be happy. But, you know, this is not the case with real production apps.
2. The Complexity of Software Development
Most software is more complicated and scarier than we think due to business requirements. Let's look at a real-life scenario. Imagine joining a new restaurant booking app. Boss asks for signup flow. New tools make it easier. Code: create user, send email, add to mailing list.
In real product development, what the user sees is, you know, oftentimes just the tip of the iceberg. Most software is more complicated and way scarier than we think because of the business requirements.
So let's look at a real-life scenario. So imagine that you join a new hot restaurant booking app. The app is in Next.js because everyone who works there is a hipster. And your boss asks you how long it will take you to build a signup flow. So you get a list of requirements from the product manager.
Okay. Let's have a look. You know, you have like a nice latte. You're kind of relaxed. It's a typical Tuesday. So you need to do, you know, security and authenticate the user. I should hold the latte. You should add the user to the database. You should use the... Add the user to the mailing list and also send a welcome email. You know, you look at that. Easy. Because today it's so much easier to build this than it was five years ago, right? Because right now you can use, for example, Clerk for authentication, Superbase for database, MailChimp for a mailing list, and Resend for email.
So you're like... Job done. That's it. Maybe this is also the end of my presentation here. But no, there's more. So you know, you take a big sip of your iced latte and go back to playing Wordle or reading one of the thousand mega threads of Dan Abramov on React server components. So this is how the code would look. You would create a user in the database. You would send the welcome email and then add the user to the mailing list. And this looks simple.
3. Challenges of Distributed Applications
Building applications using third party services is smart but creates a distributed application with limited control over infrastructure. Slow services and blocking code impact user experience. Error management, retries, logging, and recovery add complexity. Partial failures can lead to lost customers and duplicate errors are difficult to recover from.
And that's good because we like simple. I wouldn't say that we are, you know, as developers particularly lazy, but we don't like to complicate our life. And so building applications using third party services is smart and generally makes your life easier.
But where's the problem? Well, the downside now is that we have created a distributed application where we have no control whatsoever over a large part of the infrastructure our service depends on. So for example, services can be really slow. Using an email even on a good day can be taken like half a second. So now we have a problem. We have a blocking code in the critical path of the request.
So in other words, our user is wasting time because we are making our API slower. But, well, we should have that in mind that user experience shouldn't suffer because of the business requirements. So how are we doing so far? Is it fast? No. But is it reliable? Also no.
So imagine that you are adding the user to a mailing list and the service goes down. So then we have to manage the retries. You go one, two, three, four times. And what if something fails permanently? So now you have to add this errors to a logging service. And also you need to figure out, you know, recovery system. So now the estimation to build this very simple feature goes from, you know, days to weeks. And you, you know, you also have to build and manage all the infrastructure and processes.
Also, by the way, partial failures can be really gnarly. So you've added the user to the database but haven't sent them an email. So now we have three options. First, we can ignore the error which means, you know, the user will not be on the mailing list. Second, well, we can show the error to the user which will most likely lead to a lost customer. Or third, worst of all, you know, the user will try signing up again, dum-dum. So let's look at that. So let's assume that, you know, the user gets the error and tries to sign up again. But now, user, I mean, db.user.create will error out because there's a duplicate. Good luck recovering from that. So this is the mess we are in.
4. Moving Non-Blocking Tasks to Background Jobs
The app we are working on takes forever to work. We can make our code faster and more reliable by moving non-blocking tasks to background jobs using queues. Queues work great for simple applications, but as the complexity grows, additional infrastructure is needed. Durable Workflows provide a solution by combining Durable Execution and Flow Control.
The app that we are working on takes forever to work. It takes me forever to build. You know, I will be slower at my work because I have to manage all the support backlog. And you know, I'm not even dealing yet with the persistent failures. So everyone is unhappy.
My boss is unhappy. I am stressed. I'm losing sleep. Like, disaster. But there's a solution. We can make our code faster and more reliable by moving non-blocking tasks to background jobs.
Now, usually when we talk about this moving non-blocking tasks to background jobs, we talk about queues. And queues are great for data intensive processes that don't run on the main thread because they execute asynchronously. So, well, in late terms, you can think about queues like this. You know that once you add something to the queue, it will reach its destination one by one. Whatever is happening in the queue does not impact other parts of the infrastructure. So the delivery is guaranteed.
But what happens after the message reaches the consumer is still up to you. It's on you. So, the drawback here is that once you take something from the queue, the rest is on you and the queuing service just doesn't care. So, this means that queues work great when your application is simple. But when it grows in complexity, when you get very successful and have thousands of users or if it is distributed, you all of a sudden need to worry about a plethora of additional infrastructure you need to build.
So, for example, you need to maybe think about concurrency because you want to be able to control how many steps are executed at one time. Or maybe state persistence and management because now that you have a distributed application, you want to share state between different functions and queues. Or error handling, as I said, you know, because what if another service provider has an outage? So, you would need to now manage the retries and failures and timeouts. So, that's a lot of work. And as I said, we are not particularly lazy but we also like our life simple. But there's the solution. There's a solution. And I would say those are Durable Workflows which combine Durable Execution and Flow Control.
5. Durable Execution and Ingest
Durable Execution is similar to Message Queues, but you can define it as part of your code. This talk focuses on Ingest, but other providers are available. Ingest is a reliability layer that executes functions asynchronously. By adding Ingest to your project, you can define and serve functions through an HTTP endpoint. When the user signs up, events are triggered in Ingest to execute the functions.
So, Durable Execution is similar to Message Queues. But instead of having to decouple your reliability infrastructure from your application, you can define it all as part of your code. And you will see that in a second.
In this talk, I will be mostly talking about Ingest. I will take that as an example but, you know, there are other providers. In fact, CloudFlare just announced that they will be releasing Durable Workflows. Fortunately, CloudFlare's API is the same as Ingest, so you can just go ahead and play with it right away.
So, let's move non-blocking tasks to background jobs. First, you would add Ingest to the project. Ingest is a reliability layer for your app. With Ingest, you define functions or workflows inside your code base using its SDK and then you serve it through an HTTP endpoint in your application. It takes care of reliably executing your functions asynchronously. There's also a dashboard where you can monitor and debug all the functions.
Now, let's get back to our application. We are going to add the reliability layer, which is Ingest. First, we are going to wrap this function in Ingest.create function. We provide an event name that will tell Ingest to execute the function later when the user signs up. The code would look like this: create the function, provide the event name, and invoke the existing code. We do the same for the other function, using the same event name, so they both fire at the same time when the user signs up. Instead of invoking these functions directly, we trigger an event in Ingest. Our functions are exposed to Ingest as an HTTP endpoint, which Ingest uses to execute the specific function when the specific event is triggered.
6. Executing Functions in Ingest
This is how it would look in code. We create functions, provide event names, and invoke existing code. The same event name is used to fire both functions at the same time. When the user clicks a button, an event is sent to Ingest. Ingest uses an endpoint to execute the specific function when the event is triggered. Ingest retries failed functions until they succeed. Detailed logs are available on the dashboard. Errors can be debugged with Ingest without data loss. Non-blocking tasks are moved to background jobs for faster app performance.
So, this is how it would look in code. So, first we are creating the function, then we are providing the event name, and finally we are invoking our existing code from before. And now we will do the same for the other function.
And so, here you can also notice that we are using exactly the same event name and this is because we want these two things to fire at the same time when the user signs up. So, this button, by the way, is usually called fan out.
So, now when the user clicks the button, we will send an event to Ingest. And this is how it looks in code. Instead of invoking these functions directly, we are triggering an event in Ingest. So, like I mentioned before, we expose our functions to Ingest as an HTTP point and then it uses this endpoint to execute the very specific function when the very specific event was triggered.
And this is the endpoint. So, Ingest uses this endpoint to download the function definitions and to execute them. So, now here we have the complete flow. And basically Ingest will call the correct functions at the very precise time you want. And then on the dashboard, you will get a notification that there was an event triggered which in turn called two functions. So, you will see that they completed and also when.
Okay. But what happens if there's actually a failure? Well, let's look at that. So, let's imagine that Ingest invokes a function and it fails with an error code. So, Ingest will retry it and retry it and retry it until finally it succeeds. And you don't have to worry about that. Moreover, you also get like a detailed log on the dashboard what happened. And I know that, okay, I'm going to be real with you. I know that it's difficult to believe but sometimes, you know, the errors are persistent in our application not because of the service shortages but because there are bugs in our own code. Hard truth. In these cases, you know, you can just debug it with Ingest and then console and then you don't lose any data. You can recover very easily. I may show you that later. And you know, once and yeah, you can just retrigger all the events and recover the data.
So, we wanted to make our app faster. So, we moved the non-blocking tasks from the user's critical path to the background jobs.
7. Building Drip Campaigns with Ingest
Ingest allows scheduling tasks in the future and orchestrating multistep processes. A marketing drip campaign can be built using Ingest steps within a function. The code is retried if it fails, but not if it runs successfully. Ingest handles scheduling, avoiding the need for constant service function execution. Successful drip campaigns are achieved by ensuring the last step is retried until it works.
But also, we got reliability as a nice addition. So now that we have access to this infrastructure, let's see what else we can do. So, this is our app right now. I didn't tell you yet that but actually Ingest allows you to schedule tasks in the future and orchestrate multistep processes.
So, let's look at the sent email here. So, here we are just sending a welcome email. But if you've ever talked to any product manager, you will know that it's always nice to increase user retention. So, how could we do that? For example, we could send an activation email drip campaign in the first week. So, how do we go about doing that? So, first, let's create an Ingest function which will react to the same event.
So far, we have been talking about the pattern that's called fun out where you have multiple functions firing on the same event. However, many tasks require sequential progression. And here we are building a marketing drip campaign. So, it kind of makes sense that we express the code, the whole timeline as procedural code. So, we'll use ingest steps within the function. So, we are using here, you can see ingest step dot run. And in this way, this specific code will get automatically retried if it fails. But if it runs successfully, it will never be retried again. So, Ingest will skip it. And then we will also see it in action in a bit.
So, first, we are sending a welcome email and this is exactly the same part as we did before. So, then Ingest will pause execution of this function for four days. And here we are using ingest dot sleep. So, from programmer's perspective, it looks similar as just putting, you know, set time out. But actually, in the background, Ingest handles the scheduling for you. So, what that means is that your service function does not run for four days. And you don't have to sell your kidney to pay for your AWS bill, which is always nice. So, and so, now this is the last step. And that's the final email with tips. So, if there is a failure in one of the steps, Ingest will know that other steps worked and will only retry this last one until it works. And so, now we have a successful drip campaign.
8. Dynamic Campaigns and Beyond
Imagine a dynamic campaign based on user actions. Delete unnecessary code and use the booking event to determine the course of action. Fun can be had by creating different emails with various tips. Ingest is not just for sending emails, it is a versatile tool for various use cases. SoundCloud and Faye use ingest for generating dynamic videos and summarizing news and datasets. Email sending platforms also use ingest for domain verification and serverless workflows.
But of course, if you've ever talked to any product manager, you know that you can do better. So, imagine that someone already signed up and immediately made the booking. So, it doesn't make sense to send them exactly the same email as someone who, you know, is taking their time. So, maybe they need different tips, maybe they need different CTAs. So, the campaign actually could be dynamic based on user actions. So, let's build it. Let's delete these two last steps because that's what we like to do, deleting code. We can also brag about that on Twitter.
So, elsewhere in our app, when user completes the booking, there's an event sent just like booking.created. I mean, called booking.created just like user.signup. So, now we can use this event to determine the course of action. Here we are waiting for four days to see if this event happens. Next, we will now use the booking event to determine what to do next. If the booking was made, we'll reward this user with power user tips, for example. And, well, if they need four days to make the booking, maybe they need, like, you know, basic tips. And, you know, it's honestly so much fun that why stop there? You know, you can go wild and create a lot of emails with a lot of different tips. And speaking of tips, this is just the tip of an iceberg. I expected laughs here. No, I'm joking. I knew there's going to be silence.
So, we're talking here, you know, about sending emails, but it just is not really a tool for sending emails. There are numerous tools, there are numerous use cases that go beyond just marketing campaign. You can build complex payment flows, LLM prompt training, multi-step data orchestration, and so on. You can basically think about it that if you need something to, a bunch of things to happen in response to an event, maybe you need a reliability tool. And for example, SoundCloud uses ingest to streamline generating dynamic videos. Or Faye uses ingest with AI to summarize massive news and datasets. And then resend the platform, the email sending platform that we also used in this example today uses ingest to verify email domains with serverless workflows. And it works in any cloud. It's very more agnostic, language agnostic. And here we saw a lot of type and type safety.
9. Building Reliable Applications
In addition to TypeScript, we have SDKs for Python, Go, and the ability to invoke functions written in one language in another. The local dev server provides a convenient way to test the app. The function execution and retries are showcased, highlighting the importance of reliability in building production applications.
But in addition to TypeScript, we also have some SDKs for Python, Go, and you can also actually invoke functions written in one language in another. And then there's also the local dev server. And actually, I recorded a little video so that you would see that. I, you know, I'm worried of live demos. So, I thought I would just record it just to spare you the frustration with me. Okay. So, I built a little app that, you know, is a restaurant booking app. And first, of course, we are starting the ingest, CLI, I mean, ingest UI. So, this is the local dev server. And first we are going to book a table and it's going to work, right? So, then you can see that there's the function that it's run, you see all the information about it, the payload. And then you also see all the news about the function, right? When it's run, how long it slept, and so on. But, you know, that's all fun. But what happens when actually the request fails? So, I have a button that doesn't work. And so, then you see that the function failed and you go there. And you, again, get all the information. You also get the back report here. You see that, hmm, perhaps there's something in your code or my code. I get information about where that happened. But the most important thing is that you see that the retries are already queuing. So, it will be incrementally retrying, you know, more and more after longer increments. So, then you can go back to your code and, well, I can discover that I'm just not good for it, not very good at coding. And then I can go back to Ingest and retry, recover that function. And let me go back to the slides. Well, so, here I spoke a lot about Ingest. But actually, this talk is not about Ingest. It is about reliability. So, when building real world production application, reliability is really important. Not only does it keep your users happy, but it also makes you and your team more productive and more relaxed. You as the developer, you're not as backed down by maintenance and operations. But achieving reliability is really hard.
10. The Importance of Reliability
Reliability is hard to add afterwards, making it a good idea to include it in architectural choices from the start. In times of limited resources, outsourcing tasks like managing queues to cloud vendors is worth considering. Queues have been around for a long time, but it's important to prioritize reliability. Let's focus on user needs and enjoy our work.
And anyone who ever had to build a reliable system at scale knows the amount of iteration and in prior work that goes into that. It's reliability is like security. It's really hard to add afterwards and making it into your architectural choices from the get-go is a good idea.
So, in times like this, when engineering budgets and head counts are cut down, we as individual engineers need to do more constantly, do more with less. So, just as you're probably outsourcing your mailing list or email sending to cloud vendors, it's worth asking yourself, do you really want to be in the business of managing your own queues? And so, here, Matthew Drucker, the CTO of SoundCloud, thinks you shouldn't.
And if that's the case, you know, why are so many people are still using queues? Well, queues have been around since the 90s and if, you know, we are used to something, it feels familiar and cozy. Even if it's not the coziest solution. But you deserve coziness. So, to go back to the beginning of the talk, I really like to work on site projects because I can focus on user needs instead of all the daunting tasks like reliability. In 2024, we all deserve fun and joy in our day jobs, too. So, thank you so much for being here and let's hang out, let's talk, and let's be friends. Thank you.
11. Handling Ingest and Separation of Concerns
Ingest offers reliability and safety through various tools in our architecture. Our executors and data stores ensure failover and replication. Even if execution fails, events are still captured. Separation of concerns is recommended as teams tend to integrate ingest into different parts of the code base. Each task gets its own queue, allowing for scalability. Discussions are ongoing for an on-premises option for ingest.
So, if there is a failure to hit ingest, is there some sort of fallback or SDK to capture or retry? Oh, yes. So, what happens when ingest has an outage? So, basically, so long story short is that ingest offers reliability and safety. And there's numerous different tools in our architecture that actually helps us make sure that we deliver on that promise. So, for example, our executors are shared nothing and our data stores are work with active replication and master failover. So, that means that if even one execution engine fails, something else, you know, comes in. And then our execution engine is separate from our event ingestion. So, even let's say if execution fails, we are still capturing all the events. And when that's backed up, you will still not lose any data.
That's awesome. I mean, you know, talking at the end there about, like, managing your own queues, I know I've done that in the past. And I know that, yeah, I'm much less reliable than a third-party service that is built specifically for that reliability. So, yeah, absolutely. All right. One question. Okay. So, at what point would you suggest a separation of concerns? Should a user-facing server like the one you were describing kind of handle so much? Yeah. I mean, so, what I've seen is that often times one team starts with ingest for very specific task and then all of a sudden they are—it's very easy to implement ingest into your code base. And so, because of that, all of a sudden they are integrating ingest into other parts of the code base for different tasks. So, to answer the question, is there one—can one service handle everything? Yes. I mean, every task gets its own queue. So, we do have, like, some users who are running thousands of queues and last month we ran, like, millions. Wow. Yeah, no, it does—it actually seems like a separation of concerns already, right? Yeah. Like, it's all multiple different tasks now. Yeah.
Yeah. Oh, that's a good question. Okay. So, how do you work with ingest locally? Is there an on-premises option? Yeah, so, this is something that we are still discussing and it's always in the pens because people often times ask about self-hosting option for, for example, security concerns or for other. So, depending on the specific need or scenario, it might—the answer may be like, yeah, but just reach out to us on Discord.
12. Local Development and Ingest Light
We offer local development for free, it's open source, and you don't need an account. Kensi Dot plans to build an ingest light for his course.
We do have, like, local development, though, because sometimes people also ask about that. So, yeah. Okay. I think there was a question about, like, local development as well. So, that's sort of— And, basically, and also the local development is free so you don't have to have an account and it's also open source so you can just go and have fun. In fact, actually, yesterday, Kensi Dot tweeted that he is going to build, like, an ingest light so that he can use it in—within his course. So, yeah. So, it depends on your use case. Very cool. Very cool. Let's move on.
13. Infra Efforts and Ingest Handling
The UI dashboard is already provided and a new super shiny dashboard with metrics and observability is about to be released. Ingest handles failures and allows configuration of retry increments and number of retries. It ensures that the data to be captured will be captured. You can recover or stop the ingestion process as needed.
What are the expected infrastructure efforts to deploy the UI dashboard or is there anything already implemented for that? No. So, the UI dashboard is already provided for you. No need to. Right? So, actually, we are about to release our new super shiny dashboard with all the metrics and observability. So, I'm sad that I couldn't showcase it here. But, in addition to the local dev server, you also have a proper production dashboard. So, you'll get all the information there as well. Easy.
And this is a question I had as well. Uh-huh. Which is, like, how does—oops. Oh, no, I ticked it and it went away. No. I'm not very good at this, apparently. How will—it went away again. How does ingest handle failures? So, basically, like, I guess it's effectively, like, what happens if a failure continues for 10 minutes? Like, how long is it going to retry for? Oh, it will retry. Can you configure all of that? Yeah. So, you can configure both the increments and the number of retries. And basically, the ingest promises that the data that is supposed to be captured will be captured. So, it will continue. I guess there's a light worry that, like, if a service has gone down for something and now you've packed up a million events to then hit it with again. Yeah, yeah. You can recover all of that or you can also log into ingest, see, you know, those millions of events and be like, I'm done with that. Just quit, you know. You can stop it. So, no. Whatever you want to do.
14. Stopping Ingest Process and Data Retrieval
You can stop the ingestion process. There is a way to retrieve data from ingest for storage or retry later. The data is yours. That concludes the session.
You can stop it. So, no. Whatever you want to do.
And is there a way to get that data back out of ingest if you just want to kind of store it somewhere else? Yeah. Yes. And retry again later. Yes. Definitely. Yes. It depends on what you're trying to do and how you're trying to do that. But, yes. Yeah. Data is—we are not—it's not that—I don't want to go into narrowly details of how ingest works, but basically, the data is yours. So... Excellent.
Well, I think that's all we've got time for right now. Oh, thank you. Please give another round of applause to Sylvia. Thank you. And as always... Thank you. Thank you.
Comments