Video Summary and Transcription
This Talk covers various optimization techniques for Lambda functions, including parameter fetching, code minification and bundling, observability with Power Tools and X-Ray, baseline testing with load testing tools, caching with Elastic Cache and Redis, and optimizing code size and memory usage. The importance of library choices, power tuning for cost and performance, leveraging subprocesses and sandboxes, and adjusting concurrency limits are also discussed. Overall, these techniques can significantly improve Lambda function performance.
1. Introduction to Lambda Optimization
Hi, if you are here, probably you are a Node.js developer that is looking to optimize their Lambda functions. So today, I'm going to cover quite a few ideas on how you can optimize your workloads using Lambda functions. Let's start where you can optimize your Lambda function. There are two areas that you can heavily optimize your Lambda function. OK, so let's start with a solution. This is a classic thing that you can build when you are building an API with AWS and serverless.
Hi, if you are here, probably you are a Node.js developer that is looking to optimize their Lambda functions for the current workload that is running in AWS. Or maybe you're just curious and you want to understand how Lambda works and how you can squeeze all the best practices into the performance of your Lambda. Either case, I got you covered.
So today, I'm going to cover quite a few ideas on how you can optimize your workloads using Lambda functions. But moreover, I didn't want to provide just abstract ideas. I just want to show you how much you can really squeeze in. My name is Luca. I'm a serverless specialist. I work for AWS and I'm based in London. So without further ado, let's dive in, because we have a lot of ground to cover.
So first of all, we are going to cover where is the area that you can really optimize your Lambda functions. Later on, we start to see, let's say, a common API that you can build using Lambda. Moreover, we are going to look into what we can improve. And then finally, we are going to find the optimized solution where we really have the information that they're needed in order to really deeply optimize what you are building.
OK, so let's start where you can optimize your Lambda function. There are two areas that you can heavily optimize your Lambda function. This is like a sequence diagram that you can find in the AWS documentation, where as you can see, you can see that the lifecycle of a Lambda function when we create a sandbox. The two areas that are really heavy to and key to optimization are the initialization phase, where your Lambda function is basically creating a sandbox. And then we download the code at that stage. We have an initialization phase where you can load, for instance, specific things like, I don't know, the keys for a database or some parameters from parameter store service or something like that. Otherwise, you can optimize when it comes into the execution part in both areas. We will see some optimization during this talk.
OK, so let's start with a solution. This is a classic thing that you can build when you are building an API with AWS and serverless. So as you can see here, we have an API gateway exposing our API. We have a Lambda function that relies on a bunch of things like CloudWatch, system manager parameter store for retrieving the parameters needed to load. We are using Aurora serverless in this case, and therefore we are using an RDS proxy that takes care about handling the connection pool for us and also the secrets. So you don't have to handle this, your Lambda code, so it's just a utility or a service that you can use in conjunction with your Postgres database in this case. So moreover, one thing I want to specify is very often I've seen customers using Lambda functions with one of the two architectures that are available, x86 in this case. We will see later how we can optimize that.
2. Optimizing Lambda Initialization and Code Bundling
One optimization suggestion is to fetch your parameters at the initialization phase to reduce the chattiness of your Lambda function. We recommend minifying your code using ESBuild and bundling it with CDK. Additionally, bundle the AWS SDK with your code for faster execution. Power tools is an open source library that handles observability in Lambda functions. We recommend using MIDI for initialization and leveraging power tools with X-Ray for performance analysis.
So one suggestion that we usually recommend is using or fetching your parameters at the initialization phase. So then you have store for inside the sandbox and therefore you don't have to fetch for every single request. That is a good optimization because basically you reduce the chattiness of your Lambda function towards our service, other services, and it couldn't be that it could be parameter, it could be secrets, it could be some configurations you need to find at the beginning that will leave alongside the sandbox. Remember, the sandbox is not there living forever. We reclaim the sandbox after a while and therefore you are going to have a new sandbox when it's needed. So that amount of time that you are storing in memory, these things, is not going to be ours and therefore you are safe to assume that your secrets or your parameter will be refetched after a while.
The other thing is we usually recommend to minify your code by default when you're using CDK or using AWS SAM. We provide the bundler ESBuild and you can bundle your code without any problem. We even extract for you a bunch of parameters that you can tweak in order to use ESBuild with CDK. Obviously you can even bundle by yourself using, I don't know, Terraform or whatever. So first you bundle and then you use Terraform for deploying. So that's another option. But in this case, you have ESBuild that is available for you.
The other thing is in this case, I'm using ESM because I want to use top level await that is basically the capability that you have seen before for loading a parameter before you can leveraging or basically calling the handler of your function where basically your business logic lies. Last but not least, we have a bunch of modules that we have to, let's say, externalize and not bundle like all the power tools that we are going to see in a second what they are. Last but not least, quick trick, if you're using AWS SDK like we are doing in this example, always bundle it together with your code because in that case, when you are bundling, it's way faster reading from a memory than reading from disk. We offer also for the Node.js runtime the possibility to use the one that is available in the sandbox when we are building. But we usually recommend to bundle the libraries of the SDK together alongside with your business logic so it's faster to execute. So the other thing that is pretty cool is power tools. Power tools is an open source library we use for, let's say, taking care about the observability of your Lambda functions. In this case, we handle like metrics, logs, tracer and tracings. And the one thing that we usually recommend is using MIDI. There's another open source library for initializing all this. As you can see, the snippet code is there. MIDI is a really great library. If you're not familiar with that, I highly recommend that when you use Lambda functions. The other thing is power tools is pretty cool. As I said, you can use it for, for instance, in conjunction with X-Ray to create segments. If you want to understand how fast a query is or how fast an algorithm that you have written inside your Lambda function, you can do that easily with this snippet code. Or otherwise, you can use them.
3. Lambda Function Optimization and Baseline Testing
You can use curators and decorators to streamline the creation of segments for distributed tracing with X-Ray. Establishing a baseline for optimization involves load testing with artillery, using different test types and virtual user counts. To measure code starts, you can use a CloudWatch query. The first test showed 18 code starts, which is normal during ramp-up. The execution time for cold starts ranged from 1.43 to 1.21 seconds, while warm starts ranged from 228 to 167 milliseconds.
You can use also the curators for inside classes in order to, let's say, streamline these things so you don't have to create segments. And another thing you just create a decorator saying this is a subset meant that you will take the entire function as a segment so you can find your distributed tracing service that is called X-Ray, how it works and what it does.
Now, as I said, we have created this Lambda function. Obviously, in order to optimize, we need to have a baseline. And for doing that, I'm using a testing methodology that is basically using artillery for load testing. I'm using a different type of test from 45 seconds, 50 seconds to two minutes. And then I'm using two from zero to 20 virtual users and then up to 100 users.
OK, so here, another trick when you are looking for code starts to understand how many code starts you had inside the inside the specific API or when you're using Lambda function in general, there is a snippet of a query that you can use inside CloudWatch that is this one. You can find it this link here and we'll retrieve how many code starts you have for a specific Lambda function. That is very handy, to be honest.
So let's start with the first test. The first test is very simple. We move from 10 seconds ramp up to from zero to 20 requests per second. Then we move from 50 seconds, sorry, 50 requests per second in one go. So like big bump. So in the first example, we had 18 code starts. That is interesting, is let's say quite normal, because if you start to say to ramp up and you have your execution time of a Lambda function takes longer, you're going to have multiple sandboxes that has to spin up. Bear in mind that you cannot compare the amount of requests per second and matching into the execution environment to sandboxes. Why that? Because maybe your response time is lower than a second, and therefore that specific sandbox can handle multiple requests. For instance, in this example, as you can see, we reached to have 50 virtual users and therefore 50 requests per second. But we had just 18 sandboxes. Here, the P99 is over two seconds. That's the other thing that we need to take into account. And then we try to see more in detail how the things were working. So we are focusing on the execution time of a specific Lambda because that's the part that you can really tweak it, obviously. So in this case, we have our execution, worst case start, cold start. That is when we spin up a sandbox and the load your code and basically execute that. This is 1.43 seconds. The best one was 1.21. And then when we look into the warm start, where basically you already done the initialization, we are reusing the sandbox in order to respond to a request is 228 milliseconds and 167.
4. Optimizing Lambda Function with Caching
In this workload, we query Postgres for data on NFL stadiums in the US and send back the response. The second test involves starting from a cold start and immediately handling 50 requests per second. To optimize the Lambda function, we can utilize caching with Elastic Cache and Redis, as well as switch to the ARM architecture for cost and performance benefits. Implementing a cache aside pattern allows us to rely on the cache and perform fewer queries to Postgres.
What we do with that is in this workload is very simple. We are just literally querying Postgres, retrieving, let's say, all the data specific on the stadiums, NFL stadiums in the US. And we are sending back the response. So we are not doing something too complicated. But those are the data that we are going to have in this case.
OK. Now, the other interesting point is the second test. So in this case, we start from from cold start and we just hit the ground with 50 requests per second straight away. As you can see, we have way more cold starts. And also we are going to have, let's say, a P99 that is higher because we that the system has to catch up with all the simultaneous requests that are happening. So 50 requests simultaneously every second for 50 seconds is starting to, let's say, handle way more load. Obviously, those are data that might be more than enough for you and maybe you don't have strong SLA's. But for the sake for the argument's sake, I want to provide you, let's say, what you could do if you start to heavily optimize your Lambda function.
So now when I'm thinking about this architecture, I start to think, OK, what is so we have like NFL stadiums. NFL stadiums I'm not going to change every second. It's something that is stable enough and maybe I don't have a CDN and I need to, let's say, use some level of caching that that can inside my Lambda functions. In this case, I was thinking, OK, why I need to every time make a query to Postgres to receive exactly the same thing or leverage the index of postgres for returning back the response. Why cannot I use just Elastic Cache using Redis for in this case and having an in-memory database for apply a cache aside pattern. And moreover, I started to implement other things. So first of all, I change the architecture to ARM. ARM is a proprietary chipset that we call Graviton. And in this case, we are using Graviton2, and it allows you to really save in money and performance, especially if you don't if you don't have a specific library that relies on x86. And very often in Node, we have these kind of workloads that allows you to do that. Secondly, as I said, I use a cache aside pattern and the cache aside pattern in this case is very simple. First, I rely on the cache. I check when the cache is enabled. If I found the response that I am looking for in the cache, I obviously set up everything in a way that they can evict the cache after a certain period of time. If I remember well, in this example, I use 10 seconds. And then if I don't find anything, I start to perform a query towards Postgres. But in this case, I'm calling it way less.
5. Optimizing Lambda with Cache and Library Choices
Using Elastic Cache across all sandboxes provides better performance by relying on warm cache. Utilizing tree shaking in ES Build results in a smaller bundle size and faster code start. Choosing a tree-shakeable library like Postgres.js instead of SQLize significantly reduces the code size without a cache aside pattern. Take note of library sizes and consider accurately estimating memory size for Lambda functions.
The other cool thing of using Elastic Cache is this spread across all the sandboxes. So every single sandbox, we rely on warm cache. So imagine you have 10 sandboxes, you're calling the first one is called. So it creates its performance query to Postgres, store everything inside the Redis. And then all the other sandboxes, we rely on the fact that Redis is already warm. So you have better performances out of the box, all the sandboxes that are relying on that specific data. That is great.
The other thing is, in ES Build, yes, you have the minification turned to true. Great. And we had in the previous example, but you can also do tree shaking that before we didn't show, because very often people don't know that you can rely on any argument that is available in ES Build, but we don't expose all of that. Tree shaking is one of them. So in tree shaking, you need to go to the arguments and apply basically tree shaking to equal to true, and then it's going to work without any problem. That is great because it means that we can shove even less code inside our bundle. And that means there would be a direct impact inside our code start because the smaller is the code that they need to fetch in cache and through the network, faster would be my code start.
The other thing that when we are building our first implementation, we were thinking, OK, so let's take the most popular library for for querying Postgres. And what came out is a SQLize. A SQLize, if you look into NPM, is one of the most popular one, the most up to date and so on and so forth. This is probably where a developer would start if they are thinking, OK, so I need to query Postgres. So where I should look at? I start from the most popular library. However, when I bundle that and I used tree shaking, I still have 1.6. That was quite a lot, honestly, for what it does. And I started to ask myself, OK, what's the problem? Unfortunately, SQLize is not tree shakeable. Therefore, if I move to the second most famous one and there was something that was, let's say, tree shakeable, I look into Postgres.js and we move into 40 kilobytes the size of the same code, but just replacing the library I was using for that. Now here it's showing 246 kilobytes because I added a bunch of other things, libraries that I'm using for Redis and so on. But at the end was literally 40 kilobytes for doing exactly the same thing without using a cache asset pattern. That is great, to be honest. So bear in mind to check your libraries. The other thing is before in the unoptimized solution, as I call it, we were just guesstimating the size of memory for our Lambda function. So as you can see here, we specify 256 megabytes. But in reality, for similar costs, I can, I should have chosen 512 megabytes because in that case, the response time was faster.
6. Optimizing Lambda with Power Tuning and Extensions
Power tuning is an open source tool for optimizing Lambda functions for cost and performance. More memory doesn't necessarily mean higher costs, as you pay for execution time. By leveraging extensions, you can access the temp folder and utilize features like observability and caching. The extension allows communication with parameter store and secrets manager, enabling local caching and reducing the need for HTTP requests. Remember to pass the AWS session token for authentication.
And this one is a tool that we offer open source called power tuning. And what it does is basically trying your Lambda function with different configuration, memory configuration. And that can tell us you and you can choose from this diagram if you want to use for optimizing for cost for performance. That is great.
When I started to do exactly the same thing in my CI using the optimized version, I found the sweet spot around 512 megabytes, one gig. And bear in mind, more memory doesn't necessarily mean that it costs more because you pay for execution time. And therefore, it means that in this case, if I have more memory, the execution time is lower. And therefore, I spend less. So bear in mind that lower memory doesn't replicate in lower costs. It might be that you spend more. This is a great tool to test this thing.
So now we change the memory size. We found the right sweet spot using power tuning. So great. And you can also implement obviously your CI CD if you want. The other thing that we change is before we were calling the parameters that I needed for instruct all the, for instance, I don't know, future flags or whatever, that have to instruct inside the code from parameter store initialization phase.
But I have also another approach using an extension that is basically a sub process that is living alongside your Lambda function code that has access to the temp folder and all the other things that are available. But is a process that is there is living inside this execution environment. So it doesn't say share anything with any other execution environment, but I can do I can use that for leveraging things that are available, like observability, for instance, or in this case, there is a fantastic extension that takes care of not only sending the request towards your parameter store or secrets manager, but also caching it locally. So I don't have to perform a request for HTTP every time.
So in this case, I can set up through a bunch of environment variables, the TTL of parameter store, and therefore I know that every 100 milliseconds, the cache will be evicted and I'm going to perform another fetch from a parameter store. But I can tweak it in the way that I want and it works easily, very easily with parameter store and secrets manager. So I can really squeeze in quite a lot of milliseconds there. So as you can see, there are quite a lot of parameters and how you call in this case, these parameter store, but through localhost. So the communication between an extension and my parameter store that is or cache parameter store that are over there are literally this way. This is great because obviously I'm just communicating in this way. You can call for HTTP, HTTPS up to you. But again, it's a great way to do that. The other thing that you need to bear in mind, you need to pass the AWS session token that is available in every lambda function in order to perform this talk inside the header. That is, as you can see, at the bottom of this snippet of code.
7. Optimizing Lambda with Subprocesses and Sandboxes
Using a subprocess to fetch data offloads traffic and improves performance. The optimized code reduced call starts and improved P99 and execution times. By leveraging the sandbox and adjusting the concurrency limit, you can achieve high throughput. Keep in mind the guaranteed number of requests per second per sandbox.
But apart from that, that idea of using a sub process for fetching this data takes care about the cache and TTL and eviction and everything is great because offload a lot of traffic that is going towards another service. So great thing.
So when we start to do the same test and we move with similar tests, so we did like a ramp up from zero to 10 requests per second, then sustained to 50 requests per second, we move with this optimized code to nine call starts. So absolutely great. So we have a number of calls. But moreover, look at the P99. P99 in the beginning was over two seconds. Now is not anymore.
And when I move even further and I start to have instead, let's say different metrics for the call start, we look into that and select 794, 800 milliseconds, OK, way less than what was before in the worst call start and even better, half a second, more or less in the best call start. But look at the execution time when those APIs are warm. We moved to 73 milliseconds in the worst one, but the best one was six milliseconds.
Now I had several of them when I was doing this testing that was around six, seven milliseconds. It was oscillating between these numbers. But the beauty of this approach is that you can really squeeze in. Now, that's why you have way less code stuff, because you can squeeze in way more response in a second because it takes just six milliseconds for getting the response. So obviously, the faster, the better.
The other cool thing of this is that if you think about that, if you have a multiple request, we've spin up a sandbox. We have by default, the concurrently a concurrency limit of a thousand is a soft limit that you can increase. Just ask to your solution architect in AWS or raise a ticket with the support and you can raise the limit of your lambda function. So that that's a good hint. It's just a soft limit that one doesn't.
OK, so if you have 100 millisecond response time for a sandbox, it means that you have 10 TPS per sandbox. But if I asked it, how many I have, you would say double it. Not really. There is another thing that you need to think about because, you know, we have this approach of using sandbox and these leaving alongside with other obviously customers. We guarantee that you have at least 10 requests per second per sandbox. I can guarantee you that when I try that, I was outperforming 10 very easily with this approach because I was, let's say, having a very fast execution environment. But that means obviously that if you're not capable to squeeze in less than 100 milliseconds, then probably you need you need to think about that you are going to handle roughly 10 TPS. Obviously, this is like the guaranteed number that is available in the documentation.
8. Optimizing Lambda Performance and Key Takeaways
Using cache, choosing reshapable libraries, and optimizing code can significantly improve lambda function performance. Bundle size also plays a role in cold start performance. For more details and example code, refer to the provided link.
Obviously, this is like the guaranteed number that is available in the documentation. But again, it's absolutely great because we can do way more with this. So now in test two, we just hit the ground running and we just hit 50 TPS in one go. As you can see here, the code starts at still nine because we were very fast in the response time. And I want to show you also a data point with and without cache. I created like a feature flag that allowed me remotely to activate or deactivate the cache. And as you can see, the cache can play a very important action here because we move from 124 millisecond in the optimized version without cache to 39, so 40 milliseconds with cache. That is another great data point that shows you that probably sometimes using a bit of cache because not every data has to be real time can really play a huge amount of benefit for your SLA and your performance of APIs.
OK, so what we have seen so far in this talk is that we move your code start from 56 percent improvements without any problem and 95 percent of the warm start. Those are big numbers. OK, so those are really interesting numbers. So really applying small optimizations here and there have like practices and tools like power tuning and other things will really make an impact inside your lambda functions. Now, main takeaways. First of all, choose your library wisely because if they're not reshapable, you are missing out a possibility to optimize your workloads, especially for cold starts. Use cache where possible. It could be in memory. You don't have to in memory set the lambda function. You don't need to use everytime redis. It could be that you store some information inside the temp folder that is accessible inside your lambda function. It could be that you have, let's say, an extension that is caching your data like we have seen with the parameter store one. So remember, there are many ways that you can optimize your code. Think about how he behaves. What are the behaviors that are acceptable for your business and try to optimize in that case? Extension can be your best friend. Obviously, you might pay a bit on the cold start, but then in the long run for the warm start, you will have a way way less overhead and latency. And bear in mind, if you optimize heavily your code, there is no way that you are going to have many cold starts. So your P99 will look great. Last but not least, bundle size matters. So if you choose wisely your library and you use three shake and use modification, you definitely have a smaller bundle size, and that means you have better performance in cold start. So don't forget that.
If you want to deep dive into the example and try yourself the code at the basic ratio, just the results of what I was doing. You can find an article at this link where there is a link on the GitHub repository. It is available in the samples and you can find how the things are working and more details on the optimization size that I have discussed today. So thank you very much for watching. I hope that you enjoy. If you have any questions, feel free to reach me out on this email and thanks again and enjoy the rest of the conference.
Comments