Serverless in Production, Lessons from the Trenches

Rate this content
Bookmark

Serverless technologies help us build better and more scalable applications in the cloud. It's a powerful paradigm that allows us to focus on creating business values and letting the cloud provider handle the undifferentiated heavy-liftings such as managing the underlying infrastructure. In this session, Yan Cui would take us through many of the lessons that he has learnt from running serverless workloads in production over the last five years, including development tips, testing and observability strategies, and much more.


You can check the slides for Yan's talk here.

This talk has been presented at Node Congress 2022, check out the latest edition of this Tech Conference.

FAQ

Yen Chui is one of the first AWS serverless heroes and currently works as a developer advocate with Lumigo. He also works as an independent consultant helping companies succeed with serverless technologies.

The primary lesson is the importance of thinking about how to troubleshoot your system from the start. Observability is crucial because it allows you to infer the internal state of a system from its external output, helping you fix problems quickly when things go wrong.

Logs are considered overrated because they are not the most effective means of troubleshooting under time pressure. Browsing through large amounts of log messages can feel like finding a needle in a haystack. Instead, using tools like Lumigo for structured logs and system metrics is more efficient.

Yen Chui uses a combination of Lumigo, structured logs, and CloudWatch for system metrics and alerts. Lumigo captures detailed information about Lambda invocations, helping to quickly identify and resolve issues.

Using multiple AWS accounts helps avoid service limits, compartmentalize security breaches, and insulate environments and teams from each other. It also helps manage resources more effectively and avoid throughput limits that can affect application scalability.

AWS Control Tower helps manage and provision multiple AWS accounts through a centralized dashboard. However, Yen Chui prefers using OrgFormation, a tool that allows managing AWS organizations with infrastructure as code, using a syntax similar to CloudFormation.

Secrets should be stored in SSM Parameter Store or Secrets Manager and encrypted at rest with KMS. They should be fetched and decrypted at runtime during cold start, and never stored in plain text in environment variables. Tools like MIDI can help manage and cache these secrets securely.

The principle of least privilege involves giving your Lambda functions the minimum amount of permissions necessary. This minimizes the blast radius in case of a security breach, ensuring that attackers can access only the resources absolutely necessary for the function's operation.

Zero-trust networking does not trust any entity just because it is within the network boundary. Every request to internal APIs must be authenticated and authorized, often using AWS IAM authorization, to ensure robust security even if the network perimeter is breached.

Quick wins include setting the HTTP keep-alive environment variable for Node.js functions, using database proxies for RDS, trimming dependencies for smaller deployment artifacts, and using Lambda layers to bundle dependencies. Additionally, using Lambda destinations instead of DLQs captures both the invocation payload and error context, making debugging easier.

Yan Cui
Yan Cui
34 min
18 Feb, 2022

Comments

Sign in or register to post your comment.

Video Summary and Transcription

This Talk provides valuable insights for those considering serverless in 2022, with a focus on troubleshooting and observability using Lumigo. It emphasizes the use of multiple AWS accounts and Org Formation for better control and scalability. Security considerations include securely loading secrets at runtime and implementing zero-trust networking. Optimizing Lambda performance is discussed, along with updates on serverless frameworks and the role of Terraform. The Talk also compares Honeycomb and Lumigo for observability in serverless applications.

1. Introduction to Serverless Lessons

Short description:

In this talk, I will share the lessons I've learned from running serverless workloads in production over the past five years. I'll provide valuable insights for those considering serverless in 2022.

Hi everyone, thank you for joining this talk where I'm going to tell you about some of the lessons I've learned running production workloads with serverless the last five years. My name is Yen Chui. I'm one of the first AWS serverless heroes and nowadays I work as a developer advocate with Lumigo, which I think is the hands down best ability tool for serverless applications. The other half of my time I work as an independent consultant where I work with companies around the world to help them succeed with serverless, so I've been running workloads in production using several technologies since 2016 and quite a few number of things that's come up along the way and I've categorized them into a number of lessons, which I think will be really useful for everyone who's thinking about using serverless in 2020, in 2022 rather.

2. Importance of Troubleshooting and Observability

Short description:

You need to think about how you're going to troubleshoot your system from the start. Observability is crucial because bad things will happen. Logs are overrated for troubleshooting under time pressure. I use Lumego, structured logs, CloudWatch, and Lumego's alerts for troubleshooting.

The first and arguably the most important lesson is that you really need to think about how you're going to troubleshoot your system right from the start because it's going to be a much harder problem for you to fix after the fact. And observability is a measure of how well the internal state of a system can be inferred from its external output, and it's absolutely crucial because bad things are going to happen to your system. It might not have happened yet, but you will eventually, because everything fails all the time, as Werner Vogel famously said.

And when things go wrong and users are impacted, you need to be able to fix the problems as quickly as possible. And that requires us to be able to both identify the issue, but also to resolve them in a timely fashion. I've spent many years just swimming around in the log messages, and I've come to the conclusion that logs are overrated. They are useful, but they're not the most effective means of troubleshooting problems when you're under time pressure.

And I think at the best of times, it sometimes feels like an exercise of finding a needle in a haystack when you're browsing through huge amounts of log messages. So nowadays I have adopted a different approach whereby I'm using a combination of Lumego, and not writing many logs, but when I do, I make sure that my logs are structured, and that they cover the blind spots that I don't see in Lumego. And then I also use the CloudWatch for system metrics and the alerts to complement the alerts that I get from Lumego. And most of my troubleshooting is done inside Lumego, where I either be notified via a Slack alert, or I go to the issues page, where I can see all the recent errors, and they've been captured and categorized by function and error type.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Node Congress 2022Node Congress 2022
26 min
It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Top Content
The talk discusses the importance of supply chain security in the open source ecosystem, highlighting the risks of relying on open source code without proper code review. It explores the trend of supply chain attacks and the need for a new approach to detect and block malicious dependencies. The talk also introduces Socket, a tool that assesses the security of packages and provides automation and analysis to protect against malware and supply chain attacks. It emphasizes the need to prioritize security in software development and offers insights into potential solutions such as realms and Deno's command line flags.
Towards a Standard Library for JavaScript Runtimes
Node Congress 2022Node Congress 2022
34 min
Towards a Standard Library for JavaScript Runtimes
Top Content
There is a need for a standard library of APIs for JavaScript runtimes, as there are currently multiple ways to perform fundamental tasks like base64 encoding. JavaScript runtimes have historically lacked a standard library, causing friction and difficulty for developers. The idea of a small core has both benefits and drawbacks, with some runtimes abusing it to limit innovation. There is a misalignment between Node and web browsers in terms of functionality and API standards. The proposal is to involve browser developers in conversations about API standardization and to create a common standard library for JavaScript runtimes.
ESM Loaders: Enhancing Module Loading in Node.js
JSNation 2023JSNation 2023
22 min
ESM Loaders: Enhancing Module Loading in Node.js
ESM Loaders enhance module loading in Node.js by resolving URLs and reading files from the disk. Module loaders can override modules and change how they are found. Enhancing the loading phase involves loading directly from HTTP and loading TypeScript code without building it. The loader in the module URL handles URL resolution and uses fetch to fetch the source code. Loaders can be chained together to load from different sources, transform source code, and resolve URLs differently. The future of module loading enhancements is promising and simple to use.
Out of the Box Node.js Diagnostics
Node Congress 2022Node Congress 2022
34 min
Out of the Box Node.js Diagnostics
This talk covers various techniques for getting diagnostics information out of Node.js, including debugging with environment variables, handling warnings and deprecations, tracing uncaught exceptions and process exit, using the v8 inspector and dev tools, and generating diagnostic reports. The speaker also mentions areas for improvement in Node.js diagnostics and provides resources for learning and contributing. Additionally, the responsibilities of the Technical Steering Committee in the TS community are discussed.
You Don’t Know How to SSR
DevOps.js Conf 2024DevOps.js Conf 2024
23 min
You Don’t Know How to SSR
The Talk covers the speaker's personal journey into server-side rendering (SSR) and the evolution of web development frameworks. It explores the use of jQuery for animations in SSR, the challenges faced in integrating React with Umbraco, and the creation of a custom SSR framework. The Talk also discusses the benefits of Next.js and the use of serverless artifacts for deployment. Finally, it highlights the features of Astro, including its function per route capability.
Node.js Compatibility in Deno
Node Congress 2022Node Congress 2022
34 min
Node.js Compatibility in Deno
Deno aims to provide Node.js compatibility to make migration smoother and easier. While Deno can run apps and libraries offered for Node.js, not all are supported yet. There are trade-offs to consider, such as incompatible APIs and a less ideal developer experience. Deno is working on improving compatibility and the transition process. Efforts include porting Node.js modules, exploring a superset approach, and transparent package installation from npm.

Workshops on related topic

AI on Demand: Serverless AI
DevOps.js Conf 2024DevOps.js Conf 2024
163 min
AI on Demand: Serverless AI
Top Content
Featured WorkshopFree
Nathan Disidore
Nathan Disidore
In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.
Node.js Masterclass
Node Congress 2023Node Congress 2023
109 min
Node.js Masterclass
Top Content
Workshop
Matteo Collina
Matteo Collina
Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate
Build and Deploy a Backend With Fastify & Platformatic
JSNation 2023JSNation 2023
104 min
Build and Deploy a Backend With Fastify & Platformatic
WorkshopFree
Matteo Collina
Matteo Collina
Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/). 
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.
Building a Hyper Fast Web Server with Deno
JSNation Live 2021JSNation Live 2021
156 min
Building a Hyper Fast Web Server with Deno
WorkshopFree
Matt Landers
Will Johnston
2 authors
Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.
0 to Auth in an Hour Using NodeJS SDK
Node Congress 2023Node Congress 2023
63 min
0 to Auth in an Hour Using NodeJS SDK
WorkshopFree
Asaf Shen
Asaf Shen
Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher
Deploying React Native Apps in the Cloud
React Summit 2023React Summit 2023
88 min
Deploying React Native Apps in the Cloud
WorkshopFree
Cecelia Martinez
Cecelia Martinez
Deploying React Native apps manually on a local machine can be complex. The differences between Android and iOS require developers to use specific tools and processes for each platform, including hardware requirements for iOS. Manual deployments also make it difficult to manage signing credentials, environment configurations, track releases, and to collaborate as a team.
Appflow is the cloud mobile DevOps platform built by Ionic. Using a service like Appflow to build React Native apps not only provides access to powerful computing resources, it can simplify the deployment process by providing a centralized environment for managing and distributing your app to multiple platforms. This can save time and resources, enable collaboration, as well as improve the overall reliability and scalability of an app.
In this workshop, you’ll deploy a React Native application for delivery to Android and iOS test devices using Appflow. You’ll also learn the steps for publishing to Google Play and Apple App Stores. No previous experience with deploying native applications is required, and you’ll come away with a deeper understanding of the mobile deployment process and best practices for how to use a cloud mobile DevOps platform to ship quickly at scale.