Video Summary and Transcription
The Talk discusses the challenges of IoT development, including issues with fleet offline, data missing, alerts not working, inconsistent data, and slow loading dashboards. It explores how to build observability in IoT applications using metrics, logging, and tracing. The integration between the rules engine and Lambda is explained, highlighting the use of tools like Lambda Power Tools and X-Ray for logging, monitoring, and tracing. The Lambda invocation process and the tracing capabilities of X-Ray are also mentioned.
1. Introduction to IoT Challenges
Everybody starts in the IoT space thinking it's all sunshine and butterflies, but when you go to production, it becomes a maze. Prototyping and testing in the lab may go well, but then issues arise. Fleet offline, data missing, alerts not working, inconsistent data, and slow loading dashboards.
All right, so everybody who starts in the IoT space kind of thinks that this is how the IoT journey looks like. You know, everything is sunshine and butterflies, and you work with devices and they're very cool. You prototype with devices. You learn new protocols and so on, so you think it looks like that.
But actually, it doesn't look like that. When you're going to production with an IoT solution, it more looks like that. So you're constantly putting off fires of one kind or another. So what's really going on? So when you're working in IoT, you are actually part of an ecosystem. You are working on the device side, or you're working with teams who work on the device side. You have backends on the cloud side. You're working with cloud teams. You're working with data teams. So it's all just really relatively crazy, and it can get even crazier really fast. So it is a maze.
You suddenly are prototyping or testing with your devices in the lab, and it all works, and everything's fine, everybody's happy. And then you go to production, and suddenly 50% of your fleet is offline from one day to another. And you try to investigate why, and you don't know why. Then you've got data missing. You're sending data. You're using MQTT. You've done all the right things. Ideally, you've used quality of service one. Ideally, you've actually used local storage at the edge as well, but still data's gone missing. So your data team is complaining. You don't know where the problem is. What about the alerts you built in? Well, you're not seeing any of them. Have you actually built them? Well, I don't know. It would be a good idea if you did. Data is inconsistent. Your users are basically complaining that loading a dashboard for, I don't know, 50 devices as an aggregate, to see aggregate metrics just takes too long.
2. Building Observability in IoT
To build observability in an IoT application, you need metrics, logging, and tracing in a standardized way. Let's explore how to achieve this in a serverless backend scenario, where an IoT device sends data over MQTT, picked up by an AWS IoT rule and pushed into a Lambda function.
So it's all crazy. So what do you do about all of this? So clearly, you actually need to build observability in your application. So you need metrics. You need logging. You need tracing. And, ideally, you need all of this in a standardized way, so an operations team who is actually looking at this stuff, looking at this data can actually understand what's going on. So let's see how we can build observability in an IoT application.
I'm going to make the assumption that the back end here is mostly serverless. So I'm imagining a situation where you've got an IoT device sending some data over MQTT and you've got an IoT rule, you know, an AWS IoT rule, picking up this data and pushing it into a Lambda function. You're using this amazing cool integration that AWS IoT has with the rules engine. And you think everything is perfect, right? So if you scan that QR code, you can actually look at the code for what I'm going to show you. You can do that. I have it linked at the end as well. So I'll give, like, two seconds for people to look at that.
3. Integration between Rules Engine and Lambda
The integration between the rules engine and Lambda works asynchronously, with the Lambda function being put into a queue and executing asynchronously. To enable logging and monitoring in your applications, you can use tools like the Lambda Power Tools, an open source library that provides utilities for structured logging, metrics, and tracing. The Lambda Power Tools can be installed using Lambda layers or NPM, and can be instrumented using middleware libraries like MIDI. By injecting the tracer and logger into your Lambda function, you can send traces to X-ray for observability. In this case, an IoT device simulator sends a message to the rules engine, which interacts with the Lambda function through the Lambda service.
All right, so what I'm going to show you right now is what you might not be expecting about the integration between the rules engine and Lambda. So the Lambda function – let's skip this for now. So the Lambda function that I'm using looks a little bit like that, right? So you are setting – you have a tracing library. I will show you what the tracing library is and talk about it later. But you've actually – your Lambda function just does something and throws an exception, right? And that's basically it. And so if you look at this – you know, if you look at this architecture here, you'd expect to see the exception right away.
Well, the funny thing is that the way the rules engine actually integrates with Lambda is asynchronously, right? So rule engine sends the message to the Lambda service. Lambda service says, great, I've got it, 202. And then your Lambda function is put into a queue and it executes asynchronously. And only then, when that execution is done, somewhere, ideally in some log file, you will see the result of your Lambda execution, right? So that's why – of course, when this is happening with one device, you know, you think, yeah, I've got it under control. I can go to the log of the Lambda function, and I can look there, and I can see that actually the Lambda function failed. But ideally, you would actually use some tracing tools and, you know, some tools that you may enable you to do logging and monitoring in your applications, so you can see this stuff relatively easily.
And so one of these tools is the Lambda Power Tools, and Lambda Power Tools is actually an open source library. It's available for Typescript. It's also available for other languages like Python, for example. And what it does is it provides you with a set of utilities that you can integrate in your, you know, in your JavaScript application, so that you can create easily structured logging, you can create metrics, you can even build your custom metrics, and you can also somehow, you know, see the traces in a service that is called X-ray, right? So, I mean, of course, this works if you're integrating with AWS Services. If you're integrating with other types of services, you might want to identify different observability tools that you can use, right?
So the way you would install the Lambda Power Tools is basically using either Lambda layers or you can use NPM. And you can instrument using MIDI, which is quite a famous middleware library for Lambda, or you can do it with decorators or you can do it manually. So it's actually looking quite neat if you look at the TypeScript code, right? So here I'm just using Node modules. I'm not going to go into details on that. But then you can create your tracer and logger, and then you can just use MIDI to literally inject them into your Lambda function. Right? So with what I have here, basically all your traces are going to go from your Lambda function invocation are going to go into x-ray, right? So let's see. This is actually not looking very good. So maybe I'm just gonna switch and show it to you really quickly. I still have 56 minutes. That would be nice. Right. So basically what I've done here, I've sent a message from an IoT device simulator, which is using MQTT.js as a library. So this is my client application and this is the... In this case, it's the rules engine, and this is the Lambda context from the Lambda service.
4. Lambda Invocation and X-Ray
This is the Lambda invocation. The Lambda service accepted the function but executed it twice due to the default retry configuration in AWS Lambda. X-Ray provides tracing capabilities. Check the GitHub repositories for more information on Lambda power tools and X-Ray.
And this is actually the Lambda invocation. So when you look at this on a high level, this is actually looking really good, right? So you don't see an error at all. But when you look down here, you actually see that the Lambda service accepted the Lambda function, but then you see that it actually executed or attempted to execute this twice. So that's interesting. And that's the default configuration of AWS Lambda. It's going to retry by default if it's an asynchronous mode. So when you use SAM and you create your Lambda functions, the default is always twice. So think about that. You know, asynchronous invocation twice. Right.
So this is kind of how X-Ray looks like, and here you can see the tracing and so on. We don't have more time today, but you can always just have a look at the GitHub repositories that I have linked and learn a little bit more about Lambda power tools and about X-Ray and so on. Thank you very much.
Comments