Guardians of the Applications: Conquering Node.JS App Monitoring

Rate this content
Bookmark

Ever struggled with monitoring in your Node.JS apps? Not anymore! By sharing the good, the bad, and the hair-pulling from our own experiences, I want to help you steer clear of monitoring chaos. We’ll see how truly knowing how your apps work help you have more focused monitoring. This allows you to dodge black holes checkbox monitoring can have as you can make sure that important metrics and alerts are not swallowed. Additionally, we’ll see how strategic and focused logging, monitoring, and alerting with tools like Graylog, Grafana and Prometheus can supercharge your app’s resilience. Join to uncover how reliability and monitoring patterns and anti-patterns can help improve app quality. You will return armed with invaluable insights that can skyrocket your monitoring game!

This talk has been presented at DevOps.js Conf 2024, check out the latest edition of this JavaScript Conference.

FAQ

Monitoring plays a crucial role in preventing app crashes by continuously checking system metrics, logging information, and identifying issues early. This allows teams to address problems before they escalate.

At NippoBip, we use Greylock for logging, Grafana for dashboards, Prometheus for metrics, Obgeni for alerts, and Sensory for user-facing issues.

Monitoring and observability work together by not only tracking system metrics and logs but also providing insights into the root causes of issues. While monitoring alerts you to problems, observability helps you understand why those problems are occurring by offering a clearer view of the system's internal state.

The 'tool obsession' anti-pattern occurs when teams become overly reliant on specific monitoring tools, believing they are a magic solution. This can lead to a lack of a systematic approach and distract from delivering actual value.

Teamwork is vital in monitoring systems because it combines different perspectives from devs, ops, network, and SRE teams. This collaborative approach helps catch problems faster and ensures more comprehensive coverage, reducing the likelihood of blind spots.

'Checkbox monitoring' is when monitoring is set up just to meet a requirement without genuinely addressing the system's needs. This can result in unimportant alerts and untrustworthy data, making it harder to identify and resolve critical issues.

Companies can avoid the 'big dumb metric' anti-pattern by tracking multiple, meaningful performance metrics rather than relying on a single, oversimplified measure. This provides a more accurate and actionable understanding of system health.

To improve the efficiency of monitoring alerts, companies can prioritize critical alerts, adjust thresholds to reduce false positives, connect alerts to impactful metrics like revenue, and optimize the process to ensure the right people handle the right issues.

Customizing monitoring tools to fit specific needs ensures that the metrics and alerts are relevant and useful for the system being monitored. This helps in accurately identifying issues and prevents unnecessary distractions from irrelevant data.

Integrating alerts with communication platforms like Slack helps teams quickly receive and act on notifications, improving response times and collaboration. This ensures that issues are addressed promptly and efficiently.

Ante Tomić
Ante Tomić
21 min
15 Feb, 2024

Comments

Sign in or register to post your comment.
Video Summary and Transcription
Monitoring and observability are important for catching bugs before they become noticeable. Examples of monitoring issues include confusion and frustration when monitoring leads to misunderstandings. Teamwork is essential for effective monitoring, automation can streamline processes and improve efficiency. Custom monitoring is necessary to prevent hazards and unnecessary alerts can hurt productivity. Challenges include relying too much on monitoring without addressing root issues and struggling with manual configuration.

1. Introduction to Monitoring and Observability

Short description:

What happens when your apps crash consciously? Cut them as free revenue tanks? We feel powerless. Today I'm going to share with you the model tests and monitoring methods that can help you prevent disasters. It's important for teams to monitor systems and catch bugs before you notice them. Let's explore the important concepts of monitoring and observability. Monitoring and observability work together to detect issues early and prevent future problems.

What happens when your apps crash consciously? Cut them as free revenue tanks? We feel powerless.

Hi, I'm Martin Tomician, a senior thought manager here at NippoBip, and today I'm going to share with you the model tests and monitoring methods that can help you prevent disasters. So let me first introduce myself. I'm 30 years old and 55% deaf in both ears. And in my seven years at NippoBip, I've mastered skills like Web Infrastructure, React, Webpack, Microcontent, Automation, Monitoring, Improving Experience, and much more. But hey, I'm not just a coder, and I'm not conferencing or mentoring. If you find me exploring the world, testing outstanding photographs, and also collecting rubber ducks and magnets each week, hang on because it's going to be alright.

You know when Netflix stops working and the customers get mad fast, and you have a bad day, drivers rape customers and bus. That's why it's important for teams to monitor systems. They play detective so they can catch bugs before you notice them.

And you know capturing and catching issues early keeps up the humming of customers ready. So let's explore the important concepts of monitoring and observability. And let's take an example. Little Pedro fell hard and scraped his knee. His little assistant monitored how we limp and clean the patterns on his wound. The robot investigated why he fell and realized he had untied shoes. And together they cared for the injury and prevented the next one. Monitoring here and observability work together. And monitoring is basically like a little assistant. Keeps on going and works to see if things are working right. And she checks metrics like sees if the pedal limps and knows immediately that something is wrong. Logging here helps monitoring by checking and recording information that can be useful later. But monitoring only alerts you to issues. Monitoring can feel like wandering in the dark. You are not sure what is going on. You are unsure about the root cause. Your ability is like you flip on a head light switch. You know you light everything and you can see the logs, metrics and traces clearly. And you immediately know why Pedro fell. Because of his untied laces.

2. Examples of Monitoring Issues

Short description:

When you have mechanics and they rely on diagnostics so they can precisely fix your car, durability is basically a very similar thing. Monitoring tells us that something is wrong but not how to retain our users. In InfoPip, we use Greylock for logging, Grafana for dashboards, Prometheus for metrics, Obgeni for alerts, and Sensory for user-facing issues. Poor choices can affect company, apps, and reliability. The first example is ShopPass. The monitoring led them completely confused and frustrated. So they had to change and improve. The second example is Tom who checks Tickethype's website, ensuring everything is right.

And one more example is blinking check engine inside the car line. Because it does warn you about the issues but not about the cause. When you have mechanics and they rely on diagnostics so they can precisely fix your car, durability is basically a very similar thing. Because it looks under the hood of the software, pinpoints the problem so the thing can be fixed.

And monitoring here, for example, tells us that something is wrong but not how to retain our users. And you know with monitoring tools, they give our apps superpowers. Like heroes, apps can seem invincible but they rely on the talent behind the scenes to help spot bugs early. In InfoPip, for example, we use Greylock for logging, Grafana for dashboards, Prometheus for metrics, Obgeni for alerts, and Sensory for user-facing issues. And I'm going to show you some real monitoring workstories and use practical examples. Because of limited time, I'm not going to focus so much on boring tool demos. So let's start.

The Greek philosopher Plato once said that a good decision is based on knowledge and not on numbers. And our decisions do affect company, apps, and reliability. So it's important to see how poor choices can affect this and cause instability. And it is important to understand how our application is working, what is the correct behavior and what is its current performance so that we can do proper logging, monitoring, and troubleshooting, and so that we can make sound decisions.

The first example is ShopPass. Their system started to crash as soon as their shopping trap in jumped and managers were in panic. Now they just started buying this, this, this, this monitoring tool without any strategy, everything was disjointed, everyone was confused, and they didn't have any systematic approach. What I mean, you know, we've all tried putting out fires without actually seeing the full picture. And this is the fat anti-pattern, which is called tool obsession. When we become so obsessed with certain tools that we lose perspective, because it's so easy to think that the latest tool will be the super bullet and only to end up distracting from delivering actual value. I don't know, we shouldn't put all our faith in the tools, because you can make teams think that they are a magic wand that leads to success. Because remember, Cinderella's wedding and mother even warned her to call spells for Rerop at midnight. And the same thing is here, because nothing can replace the hard work.

So what is the problem for ShopPass? The monitoring led them completely confused and frustrated because the network was so green, but the users still complained, they wasted time trying to decode those contradictions and errors, and they didn't really have any insights about those critical backend processes. So they had to change and improve. And that's exactly why they did, because they looked closely at the vital signs and metrics, saw what is important, what keeps them healthy and on track, and they made sure to cover that and they simplified tools so that they can look only by the matter of the most. And they made a focused game plan, which allowed them to spot the early and celebrate progress and make sure the products are more stable.

The second example is, let's imagine Tom. He checks Tickethype's website, kind of like his doctor, ensuring everything was right, checks servers, speeds, databases, and especially errors.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Node Congress 2022Node Congress 2022
26 min
It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Top Content
The talk discusses the importance of supply chain security in the open source ecosystem, highlighting the risks of relying on open source code without proper code review. It explores the trend of supply chain attacks and the need for a new approach to detect and block malicious dependencies. The talk also introduces Socket, a tool that assesses the security of packages and provides automation and analysis to protect against malware and supply chain attacks. It emphasizes the need to prioritize security in software development and offers insights into potential solutions such as realms and Deno's command line flags.
Towards a Standard Library for JavaScript Runtimes
Node Congress 2022Node Congress 2022
34 min
Towards a Standard Library for JavaScript Runtimes
Top Content
There is a need for a standard library of APIs for JavaScript runtimes, as there are currently multiple ways to perform fundamental tasks like base64 encoding. JavaScript runtimes have historically lacked a standard library, causing friction and difficulty for developers. The idea of a small core has both benefits and drawbacks, with some runtimes abusing it to limit innovation. There is a misalignment between Node and web browsers in terms of functionality and API standards. The proposal is to involve browser developers in conversations about API standardization and to create a common standard library for JavaScript runtimes.
ESM Loaders: Enhancing Module Loading in Node.js
JSNation 2023JSNation 2023
22 min
ESM Loaders: Enhancing Module Loading in Node.js
ESM Loaders enhance module loading in Node.js by resolving URLs and reading files from the disk. Module loaders can override modules and change how they are found. Enhancing the loading phase involves loading directly from HTTP and loading TypeScript code without building it. The loader in the module URL handles URL resolution and uses fetch to fetch the source code. Loaders can be chained together to load from different sources, transform source code, and resolve URLs differently. The future of module loading enhancements is promising and simple to use.
Out of the Box Node.js Diagnostics
Node Congress 2022Node Congress 2022
34 min
Out of the Box Node.js Diagnostics
This talk covers various techniques for getting diagnostics information out of Node.js, including debugging with environment variables, handling warnings and deprecations, tracing uncaught exceptions and process exit, using the v8 inspector and dev tools, and generating diagnostic reports. The speaker also mentions areas for improvement in Node.js diagnostics and provides resources for learning and contributing. Additionally, the responsibilities of the Technical Steering Committee in the TS community are discussed.
Node.js Compatibility in Deno
Node Congress 2022Node Congress 2022
34 min
Node.js Compatibility in Deno
Deno aims to provide Node.js compatibility to make migration smoother and easier. While Deno can run apps and libraries offered for Node.js, not all are supported yet. There are trade-offs to consider, such as incompatible APIs and a less ideal developer experience. Deno is working on improving compatibility and the transition process. Efforts include porting Node.js modules, exploring a superset approach, and transparent package installation from npm.
Multithreaded Logging with Pino
JSNation Live 2021JSNation Live 2021
19 min
Multithreaded Logging with Pino
Top Content
Today's Talk is about logging with Pino, one of the fastest loggers for Node.js. Pino's speed and performance are achieved by avoiding expensive logging and optimizing event loop processing. It offers advanced features like async mode and distributed logging. The use of Worker Threads and Threadstream allows for efficient data processing. Pino.Transport enables log processing in a worker thread with various options for log destinations. The Talk concludes with a demonstration of logging output and an invitation to reach out for job opportunities.

Workshops on related topic

How to Solve Real-World Problems with Remix
Remix Conf Europe 2022Remix Conf Europe 2022
195 min
How to Solve Real-World Problems with Remix
Featured Workshop
Michael Carter
Michael Carter
- Errors? How to render and log your server and client errorsa - When to return errors vs throwb - Setup logging service like Sentry, LogRocket, and Bugsnag- Forms? How to validate and handle multi-page formsa - Use zod to validate form data in your actionb - Step through multi-page forms without losing data- Stuck? How to patch bugs or missing features in Remix so you can move ona - Use patch-package to quickly fix your Remix installb - Show tool for managing multiple patches and cherry-pick open PRs- Users? How to handle multi-tenant apps with Prismaa - Determine tenant by host or by userb - Multiple database or single database/multiple schemasc - Ensures tenant data always separate from others
Monitoring 101 for React Developers
React Advanced 2023React Advanced 2023
112 min
Monitoring 101 for React Developers
Top Content
WorkshopFree
Lazar Nikolov
Sarah Guthals
2 authors
If finding errors in your frontend project is like searching for a needle in a code haystack, then Sentry error monitoring can be your metal detector. Learn the basics of error monitoring with Sentry. Whether you are running a React, Angular, Vue, or just “vanilla” JavaScript, see how Sentry can help you find the who, what, when and where behind errors in your frontend project.
Node.js Masterclass
Node Congress 2023Node Congress 2023
109 min
Node.js Masterclass
Top Content
Workshop
Matteo Collina
Matteo Collina
Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate
Build and Deploy a Backend With Fastify & Platformatic
JSNation 2023JSNation 2023
104 min
Build and Deploy a Backend With Fastify & Platformatic
WorkshopFree
Matteo Collina
Matteo Collina
Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/). 
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.
Building a Hyper Fast Web Server with Deno
JSNation Live 2021JSNation Live 2021
156 min
Building a Hyper Fast Web Server with Deno
WorkshopFree
Matt Landers
Will Johnston
2 authors
Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.
0 to Auth in an Hour Using NodeJS SDK
Node Congress 2023Node Congress 2023
63 min
0 to Auth in an Hour Using NodeJS SDK
WorkshopFree
Asaf Shen
Asaf Shen
Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher