Video Summary and Transcription
Nathan Mars, tech lead at Grafana Labs, introduces observability for JavaScript applications, highlighting the effectiveness of debugging and troubleshooting with observability. Open Telemetry is presented as a standardized way to obtain system data and Grafana as a platform for monitoring metrics, logs, traces, and profiles. The talk also emphasizes the applicability of observability to the front end using GrafanaFerro to collect metrics like page load, errors, and user sessions.
1. Introduction to Observability
Hi, I'm Nathan Mars, the tech lead at Grafana Labs. Let me show you how to bring observability to your JavaScript applications. Learn how to debug and troubleshoot modern apps effectively using observability.
Hi, my name is Nathan Mars and I'm the tech lead of data visualization at Grafana Labs. Today I'm excited to share with you how to bring observability to your JavaScript applications. If you want to figure out why your app is running too slow, is broken, or you just want to improve its code quality, this is the talk for you.
Let's start with the story. Picture this. It's a Friday afternoon and you get a message from your boss that there's a mysterious bug in production. Is it time to panic? Will you have a weekend? Maybe or maybe not. Luckily for you, this fictional company only has a single monolith server, and both your server and frontend is written in JavaScript. So you have a secret weapon. The old faithful console.log.
You dig into the code and add dozens of console.log statements in an attempt to pinpoint the root cause of the error. After a lot of trial and error, you locate the issue in your node server's delete cart logic. You fix the bug and push the code directly to production. Crisis averted. But what if you were more realistically working at a company that has much more complexity, where your application is deployed across the world and every request interacts with a swarm of microservices? This is where console.log fails us. So what can we do to make sure that debugging and troubleshooting our modern applications is not a nightmare? The answer is observability.
What is observability? It means how well you can understand what's going on internally in a system based on its outputs. As systems become more distributed and complex, it's hard to see what's going on inside your application and why things may be going wrong. When talking about observability, we needed to define the data types necessary to understand the performance and health of our applications. Broadly, these are metrics, logs, and traces. Metrics are measurements collected at regular intervals. Most have a timestamp, a name, and one or more numeric values. Examples include error rate, response time, or output. Logs come directly from your app, exporting detailed data and context around an event. Engineers can recreate what has happened millisecond by millisecond. Logs should be very familiar as they are essentially more scalable and useful console.logs. Traces follow a request from the initial request to the returned output. They record the causal chain of events to determine relationships between different entities. Traces are very valuable for highlighting inefficiencies, bottlenecks, and roadblocks in user experience as they can be used to show the end-to-end latency of individual requests throughout an entire distributed architecture. Okay, great.
2. Open Telemetry and Grafana for Observability
We have an idea of what observability is and what data it is made up of. Open telemetry is the answer to getting this data, providing a standardized way of describing what your system is doing. Once we have open telemetry set up, we can export the collected data to Grafana, where we can monitor our applications' metrics, logs, traces, and profiles. Observability is not limited to the back end; it can also be configured for the front end using GrafanaFerro to collect metrics such as page load, errors, and user sessions.
We have an idea of what observability is and what data it is made up of. This brings up an important question. How can we get this data? Should we manually instrument every single service, layer by layer? No, this would take as much time as writing the code itself. Luckily, there are some awesome open source projects as well as companies that make this a lot easier.
The answer is open telemetry. What do we mean by open telemetry? Let's start off with the name itself. We have open, so like open source, and then telemetry, which is the process of gathering, analyzing, and transmitting data from remote sources to monitor the performance of systems. The word itself comes from the Greek words tele, meaning remote, and metron, meaning measure.
With any app, when you're looking at this kind of data, you have two parts that need to come together. The first is figuring out how to generate and transmit that data. The second part is deciding what you're going to do with that data. Open telemetry deals with the first part. Up until recently, there really hasn't been a standardized way of describing what your system is doing. This problem has been compounded by the vast variety of programming languages and computers in use, making it difficult to build standardized observability tooling. Solving this problem is the core of the open telemetry project, providing a standardization for describing what distributed systems are doing, no matter what programming language or computer systems you're using. Today, the open telemetry project can be described as a collection of tools, APIs, and SDKs used to instrument, generate, collect, and export telemetry data so we can analyze that data with whatever platform we wish.
Once we have open telemetry set up, we are collecting data and now need to export it somewhere to visualize it. Let's take a look at how you can use Grafana, the leading open source technology for dashboards and visualization, to analyze this data and to monitor your application. To begin with, you need to first install the instrumentation libraries. In the case of a node application, you will need to install both the open telemetry API and auto-instrumentation node packages. Next, you need to configure your application to export telemetry data. We will be writing a basic bash script that sets up open telemetry and runs our application. To get the authentication environment variables, you will need to set up a free account in Grafana Cloud and configure open telemetry for your Grafana Cloud stack. Then, you can choose a service name. Let's say this is our cart service, and then run your application via the shell script and make requests to the service to send telemetry data to Grafana Cloud.
Within Grafana Cloud, you can now observe your service and application observability. Inside of application observability, you can monitor your services' metrics, logs, traces, and profiles. Profiles we haven't mentioned yet, they identify performance bottlenecks in your application's code. Now that we have this data, what can you do with it? Well, you can configure alerts and on-call to help your team know when something's going wrong, such as if your node's application CPU usage goes above a certain threshold. Within application observability, you can also trigger an incident that automatically includes the important context from interim data. While investigating an incident, you can even run machine learning analysis via SIFT investigations, a powerful diagnostic assistant that helps you holistically analyze your system's telemetry during investigations. It does this through checks such as grouping similar error logs, identifying resources that had a recent deployment, identifying cube crashes, and more. Observability is not just confined to the back end. You can also configure observability for your front end to collect metrics such as page load, errors, user sessions, custom logs and events, and more. To start capturing telemetry from your front-end applications within Grafana, you can configure GrafanaFerro in your JavaScript initialization code. There we have it, how you can gain valuable insight and improve your JavaScript applications beyond relying on console.logs.
Thank you for your time today.
Comments