Getting Started with Open Source Observability

Rate this content
Bookmark
Slides

Learn how to instrument Node and client applications using OpenTelemetry and popular open-source observability tools like Prometheus and Grafana.

This talk has been presented at React Advanced 2022, check out the latest edition of this React Conference.

FAQ

StrongBad Inc. is a fictional software company that operates an email advice platform where users can ask various questions, which are then answered by Strong Bad.

Homestar Runner, brought on by Strong Bad, contributed to development tasks at StrongBad Inc., including adding tags and input fields to the email submission form.

After Homestar Runner's first PR was merged, the company's website was taken down, leading to several hours of stressful debugging before a fix was implemented and the website was restored.

Observability is a concept derived from control theory, which refers to the ability to infer the internal state of a system based on its outputs, helping developers understand and diagnose the performance of applications in production.

The fundamental data types used in observability include metrics (numeric or aggregated data), logs (linear stream of events), and traces (detailed path of a request through a system).

OpenTelemetry is an open source standard for observability data, providing specifications, APIs, and SDKs to collect metrics, logs, and traces, helping developers monitor and troubleshoot their applications more effectively.

Developers can use observability data such as metrics, logs, and traces to identify performance bottlenecks, understand system dependencies, and diagnose issues, ultimately aiding in enhancing the stability and efficiency of their applications.

Combining metrics, logs, and traces provides a comprehensive view of an application’s performance, allowing developers to correlate high-level aggregated data with detailed request paths and event streams for deeper insights and more accurate debugging.

Connor Lindsey
Connor Lindsey
21 min
24 Oct, 2022

Comments

Sign in or register to post your comment.
Video Summary and Transcription
Observability is a crucial tool for engineers to ship reliable websites and have a good developer experience. Metrics, logs, and traces are fundamental data types for achieving observability. OpenTelemetry is an open source standard for observability data, and instrumentation can be done manually or automatically. Data collection and transportation can be handled by packages built by the OpenTelemetry community, and the collector running can add security benefits. Grafana is used for data visualization, and it allows for analyzing web performance, exceptions, and traces. Traces capture user interactions and provide insights into user behavior and error occurrences. Leveraging these tools allows for gaining insights into various parts of the system and improving engineering goals.

1. The Story of StrongBad Inc.

Short description:

Let's talk about the fictional but not entirely unrealistic story of StrongBad Inc., a young software company. Strong Bad Inc. is an email advice platform where people can go online and they can ask questions which Strong Bad will answer. On his first day, after getting his dev environment set up, Homestar Runner was super excited to pick up his first ticket and add a tags and input to the email submission form. They found a fix and were able to ship it to production and hooray, the website was back up. As engineers, we want to ship reliable websites, debug issues quickly and confidently, and have a good developer experience. Observability is one tool in our toolbox to help achieve those goals.

Let's talk about the fictional but not entirely unrealistic story of StrongBad Inc., a young software company. My name is Connor Lindsey and I'm a software engineer at Grafana Labs.

Strong Bad Inc. is an email advice platform where people can go online and they can ask questions which Strong Bad will answer. Like, can you draw a dragon or do you like techno music? Strong Bad is a good developer but he wanted a little bit of extra help around the office so he brought on his friend Homestar Runner to help out with some development tasks.

On his first day, after getting his dev environment set up, Homestar Runner was super excited to pick up his first ticket and add a tags and input to the email submission form. Everything was great, opened his first PR, it got approved and everyone was ecstatic when it got merged into production. He was so excited to be contributing code on his first day. Only to find out after they came back from lunch that the website had been taken down. They had no idea what was going on so they walked over to customer support to see what was happening and indeed they confirmed that the system was down. After a couple of hours of stressful debugging, they found a fix and were able to ship it to production and hooray, the website was back up. But they were left wondering what caused the bug. Even after debugging, they weren't 100% sure what the root cause was. Even with the fix, they weren't left with a lot of confidence as to what was happening and how they could prevent that in the future.

So Homestar was wondering to himself, there's got to be a better way. As engineers, we have the same goals. We want to ship reliable websites. When we have issues, we want to debug them quickly and confidently, understanding the root cause of what is going on. We want performance websites and we want to understand performance issues that are happening so we can improve them. Ultimately, we want reliable and performance sites so we can have a positive user experience. So that people can go to our websites and successfully accomplish what they came there to do. We also want to have a good developer experience. It should be as painless as possible. To write, ship, and maintain code. I think one of the most frustrating things as a developer is trying to reproduce a really, really difficult to reproduce bug where you just have no leads, no idea where to get started. And are left, you know, kind of wandering aimless through your logs, through whatever kind of breadcrumbs you have to get started. So, observability is one tool in our toolbox that we can reach to to help achieve some of those goals. This concept comes from control theory. And a formal definition would be having the ability to infer the state of an internal system based on its outputs. So, when we think about our front end applications running on end user devices, they're kind of like a black box.

2. Types of Observability Data

Short description:

We can achieve insight into how our applications are running in production by using observability. Metrics, logs, and traces are fundamental data types that help achieve observability. Metrics provide aggregated data types, but traces and logs are important for more detailed insights. Traces give a detailed overview of request flow across service boundaries, while logs provide a linear stream of events. Having all these data types and correlating them is crucial for achieving observability.

We don't have full control or insight into what is happening when they're running. You know, it works on my machine, but I have no idea why it's broken on a customer's. And so, observability as a concept can help us gain insight into how our applications are running in production in the same way that we do when we're running them locally. You know, where we have full insight into exactly what's happening. We have all of our logs, we have our dev tools, we can see the network tab, et cetera.

We want to achieve some of those same tools and some of those same insights in production that we have when run locally. So, when talking about observability, let's talk about some of the tools, some of the things that we can reach for to achieve that goal. And one thing are different types of data that we can collect. Metrics, logs, and traces are kind of the fundamental data types that we have to work with. These in and of themselves do not mean that our systems are observable. But they're starting points. They're tools that we can work with to achieve observability.

So, metrics are numeric or aggregated data types. And because of aggregations that occur to get these metrics, you lose some level of detail. Which is why traces and logs are really important accompanying tools that we can reach for. So, some metrics that we can see on this dashboard are things like memory versus CPU usage. The number of requests that our servers are getting. Or for the frontend, distribution of page load times.

The next data type are traces. Traces, similar to a stack trace, will give you an overview of a request as it passes through your system. Unlike a stack trace when you are running a single application, a distributed trace, when talking about observability, can go across service boundaries. So, for example, you could see a trace of what happens when a user logs into your website. Well, some operations occur on the frontend, which then makes an HTTP request to your backend, which then go and makes a request to a caching layer to a database, for example. Traces give you a lot of detail of how the request flows through all of those systems and can be really, really useful.

Next, we'll look at logs. These are a linear stream of events, which are really handy to have but can often feel like a firehose. So, having good formatting and a good log aggregation tool make logs a lot more useful and a lot easier to work with. Individually, each of these data types are really powerful, but they have different pros and cons as to the type of information that they're showing, their cost, their performance implications, you know, what it takes to operate them. And so, having all of them and being able to correlate them is super, super useful. For example, when we think about a metric, it's this aggregated high-level overview of a single data point, whereas a trace is a single request that's passing through a system.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Levelling up Monorepos with npm Workspaces
DevOps.js Conf 2022DevOps.js Conf 2022
33 min
Levelling up Monorepos with npm Workspaces
Top Content
NPM workspaces help manage multiple nested packages within a single top-level package, improving since the release of NPM CLI 7.0. You can easily add dependencies to workspaces and handle duplications. Running scripts and orchestration in a monorepo is made easier with NPM workspaces. The npm pkg command is useful for setting and retrieving keys and values from package.json files. NPM workspaces offer benefits compared to Lerna and future plans include better workspace linking and adding missing features.
Automating All the Code & Testing Things with GitHub Actions
React Advanced 2021React Advanced 2021
19 min
Automating All the Code & Testing Things with GitHub Actions
Top Content
We will learn how to automate code and testing with GitHub Actions, including linting, formatting, testing, and deployments. Automating deployments with scripts and Git hooks can help avoid mistakes. Popular CI-CD frameworks like Jenkins offer powerful orchestration but can be challenging to work with. GitHub Actions are flexible and approachable, allowing for environment setup, testing, deployment, and custom actions. A custom AppleTools Eyes GitHub action simplifies visual testing. Other examples include automating content reminders for sharing old content and tutorials.
Fine-tuning DevOps for People over Perfection
DevOps.js Conf 2022DevOps.js Conf 2022
33 min
Fine-tuning DevOps for People over Perfection
Top Content
DevOps is a journey that varies for each company, and remote work makes transformation challenging. Pull requests can be frustrating and slow, but success stories like Mateo Colia's company show the benefits of deploying every day. Challenges with tools and vulnerabilities require careful consideration and prioritization. Investing in documentation and people is important for efficient workflows and team growth. Trust is more important than excessive control when deploying to production.
Why is CI so Damn Slow?
DevOps.js Conf 2022DevOps.js Conf 2022
27 min
Why is CI so Damn Slow?
Slow CI has a negative impact on productivity and finances. Debugging CI workflows and tool slowness is even worse. Dependencies impact CI and waiting for NPM or YARN is frustrating. The ideal CI job involves native programs for static jobs and lightweight environments for dynamic jobs. Improving formatter performance and linting is a priority. Performance optimization and fast tools are essential for CI and developers using slower hardware.
The Zen of Yarn
DevOps.js Conf 2022DevOps.js Conf 2022
31 min
The Zen of Yarn
Let's talk about React and TypeScript, Yarn's philosophy and long-term relevance, stability and error handling in Yarn, Yarn's behavior and open source sustainability, investing in maintenance and future contributors, contributing to the JavaScript ecosystem, open-source contribution experience, maintaining naming consistency in large projects, version consistency and strictness in Yarn, and Yarn 4 experiments for performance improvement.
Atomic Deployment for JS Hipsters
DevOps.js Conf 2024DevOps.js Conf 2024
25 min
Atomic Deployment for JS Hipsters
This Talk discusses atomic deployment for JavaScript and TypeScript, focusing on automated deployment processes, Git hooks, and using hard links to copy changes. The speaker demonstrates setting up a bare repository, configuring deployment variables, and using the post-receive hook to push changes to production. They also cover environment setup, branch configuration, and the build process. The Talk concludes with tips on real use cases, webhooks, and wrapping the deployment process.

Workshops on related topic

Deploying React Native Apps in the Cloud
React Summit 2023React Summit 2023
88 min
Deploying React Native Apps in the Cloud
WorkshopFree
Cecelia Martinez
Cecelia Martinez
Deploying React Native apps manually on a local machine can be complex. The differences between Android and iOS require developers to use specific tools and processes for each platform, including hardware requirements for iOS. Manual deployments also make it difficult to manage signing credentials, environment configurations, track releases, and to collaborate as a team.
Appflow is the cloud mobile DevOps platform built by Ionic. Using a service like Appflow to build React Native apps not only provides access to powerful computing resources, it can simplify the deployment process by providing a centralized environment for managing and distributing your app to multiple platforms. This can save time and resources, enable collaboration, as well as improve the overall reliability and scalability of an app.
In this workshop, you’ll deploy a React Native application for delivery to Android and iOS test devices using Appflow. You’ll also learn the steps for publishing to Google Play and Apple App Stores. No previous experience with deploying native applications is required, and you’ll come away with a deeper understanding of the mobile deployment process and best practices for how to use a cloud mobile DevOps platform to ship quickly at scale.
MERN Stack Application Deployment in Kubernetes
DevOps.js Conf 2022DevOps.js Conf 2022
152 min
MERN Stack Application Deployment in Kubernetes
Workshop
Joel Lord
Joel Lord
Deploying and managing JavaScript applications in Kubernetes can get tricky. Especially when a database also has to be part of the deployment. MongoDB Atlas has made developers' lives much easier, however, how do you take a SaaS product and integrate it with your existing Kubernetes cluster? This is where the MongoDB Atlas Operator comes into play. In this workshop, the attendees will learn about how to create a MERN (MongoDB, Express, React, Node.js) application locally, and how to deploy everything into a Kubernetes cluster with the Atlas Operator.
Azure Static Web Apps (SWA) with Azure DevOps
DevOps.js Conf 2022DevOps.js Conf 2022
13 min
Azure Static Web Apps (SWA) with Azure DevOps
WorkshopFree
Juarez Barbosa Junior
Juarez Barbosa Junior
Azure Static Web Apps were launched earlier in 2021, and out of the box, they could integrate your existing repository and deploy your Static Web App from Azure DevOps. This workshop demonstrates how to publish an Azure Static Web App with Azure DevOps.
How to develop, build, and deploy Node.js microservices with Pulumi and Azure DevOps
DevOps.js Conf 2022DevOps.js Conf 2022
163 min
How to develop, build, and deploy Node.js microservices with Pulumi and Azure DevOps
Workshop
Alex Korzhikov
Andrew Reddikh
2 authors
The workshop gives a practical perspective of key principles needed to develop, build, and maintain a set of microservices in the Node.js stack. It covers specifics of creating isolated TypeScript services using the monorepo approach with lerna and yarn workspaces. The workshop includes an overview and a live exercise to create cloud environment with Pulumi framework and Azure services. The sessions fits the best developers who want to learn and practice build and deploy techniques using Azure stack and Pulumi for Node.js.