Video Summary and Transcription
Microfrontends follow the microservices paradigm and observability is crucial for debugging runtime production issues. Error boundaries and tracking errors help identify and resolve issues. Automation of alerts improves incident response. Observability can help minimize the time it takes to understand and resolve production issues. Catching errors from the client and implementing boundaries can be done with tools like OpenTelemetry.
1. Introduction to Observability for Micro-frontends
Hi, everyone. My name is Konstantinos, and I'm a software engineer based in London. Today, I'll be talking about observability for micro-frontends. We'll explore what micro-frontends are, their relationship with observability, and why frontend engineers should care about it. We'll also have a demo applying observability to micro-frontends. At DAZN, we heavily invest in micro front ends, creating teams that focus on specific business domains. Let's start with microservices, which were introduced to separate the backend monolith into smaller pieces of logic and data. We applied the same architecture to frontend applications, giving birth to microfrontend apps.
Hi, everyone. My name is Konstantinos, and I'm a software engineer based in London. The topic I would like to share with you today is about observability for micro-frontends.
Quick agenda for today, I'm going to talk about micro-frontends and observability so as to understand what are they, if they can fit together, and how they can coexist. Should we, as frontend engineers, care about observability? And finally, a demo so as we can apply observability to the micro-frontends.
Some goals for today. We're going to define the path to observability. This is going to be our starting point and how we can actually grow from there. We're going to see some useful patterns that me and my team found really helpful during our journey to identify runtime issues in production and we're gonna automate how we are getting notified and receiving alerts for production incidents.
So let's crack on. Micro-frontends. First of all, if you're wondering if they exist, I can tell you they do exist. They're not a magic unicorn, they're actually not a tool, not a technology, they're a way to scale software and teams. They help us better structure our teams in order to focus on a business domain and help a team solve and own a specific problem of the business.
At DAZN, where I'm currently working, we apply micro-frontends but let me first give you an overview of it. This video will give you some further context. DAZN is a live and on-demand sports streaming platform, giving sport fans the control and flexibility to watch their sports their way. You don't need a cable or a satellite dish to watch it, so the setup is pretty quick and simple. You can download the DAZN app and watch on multiple devices at home or on the go. We are deploying software to multiple targets like web, mobile, and TVs. At DAZN, we are heavily invested in micro front ends, creating front end teams focusing on a specific business domain.
Let's start with microservices. Some years ago, we realized that we want to start building teams around a specific problem of the business. Microservices were introduced as a way to separate the back end monolith to smaller pieces of logic and data. As you can see here, we have the discovery service, the authentication service, my account and preferences service. These services are focusing in a specific domain, in a specific business domain. Sometimes, we have an aggregation layer in front of it, like can be a backend for front-end or a GraphQL API. This is what front-end applications are using in order to consume these services if they're not directly consuming them. And at that time, front-end was still a monolith. And we said, why not apply the same architecture in front-end applications? That is how microfrontend apps were born.
2. Microfrontends and Observability
Microfrontends follow the microservices paradigm, they separate the monolith and front-end application to separate the smaller ones. Now we have end-to-end teams with specific mission, focusing on a piece of the business. In either case, micro-front-ends have to follow the same principles like micro-services. But for the time being, I want us to focus on micro-front-ends as high observable systems. Observability is a way to democratize this process. It gives all the team members, actually the most curious ones, the ability to debug these issues. Observability is the condition of being empowered to ask why, giving you the flexibility to dig into the unknown unknowns on the fly.
Microfrontends follow the microservices paradigm, they separate the monolith and front-end application to separate the smaller ones. Now we have end-to-end teams with specific mission, focusing on a piece of the business. Like here, you can see we have the team discovery, we have the team authentication, team My Account, or team preferences, and they expand the team across the stack. We have front-end, backend, and database teams that have focused on this specific mission of the business.
The teams here might feel like very well defined, but sometimes there might not be that clear separation. If we zoom in, one of these micro-front-ends, we can find some more micro-front-ends, and this is when things start becoming a bit confusing, and separation of concerns can get a bit blurry. Here for example, we might have a team A responsible for header and footer, but we might have a team B responsible for my account details. These are different modules in this micro-front-end, but different teams might have different responsibilities.
In either case, micro-front-ends have to follow the same principles like micro-services. But for the time being, I want us to focus on micro-front-ends as high observable systems. In micro-front-ends, it's easier to observe a single system than a system split in multiple modules. But how are we able to provide high observability in the latter Zoom in case of micro-front-end? But in order to do that, let's try to understand what is observability.
Observability as a term can be quite abstract, so let's try to limit it down to a more specific definition. Monitoring is not observability. You might have seen dashboards and metrics tracking a live system. Monitoring has been the de facto approach for so long. People tend to think of it as the only way of understanding their systems instead of just one way to understand them. Monitoring tells you when something is wrong. You need to know in advance what signals you want to monitor, your known unknowns.
In a similar way intuition is not observability. You might have been to a case where in your team you have a production issue and one of the most experienced engineer in the team is able to debug and find out the root cause of that issue and finally resolve it. Observability is a way to democratize this process. It gives all the team members, actually the most curious ones, the ability to debug these issues. Kalman said, Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. Kalman was an engineer in the 60s and defined observability. Observability has its roots in control theory and in order to understand this definition we would need to deep dive in linear algebra and formal methods, but we won't do that. It now means different things in different communities and can also be applied to modern software systems. But one important bit of observability is this. Observability is the condition of being empowered to ask why, giving you the flexibility to dig into the unknown unknowns on the fly. How can we achieve this? We can do that with high cardinality.
3. Observability and Demo
Cardinality refers to the number of values in a set. High cardinality fields can help identify and narrow down issues. Observability tools are designed for high cardinality data. Frontend teams are responsible for observing their own systems. New Relic is used to log runtime production issues, but it may not provide enough information to identify the root cause. In our demo, we have a host application and modules. We can trigger errors in the modules and observe their behavior. The application is divided into micro frontend host and module repositories.
Cardinality refers to the number of values in a set. Usually represents the relationship between the data in two different tables by highlighting how many times a specific entity occurs compared to another. Now the term high cardinality means that there can be many possible values for a single attribute. A high cardinality field can help uniquely identify an issue and they let you specifically narrow down precisely what caused something to go wrong. Observability tools are specifically designed to query against high cardinality data. Without high cardinality data you lack observability to detect, isolate, and mitigate the underlying issues impacting your system.
But at this point you might be wondering why frontend should even care about observability. Isn't this something that the different teams should be responsible for? You've been seeing the observability mostly as a topic connected with microservices and distributed systems. At DAZN we have an engineering culture and we have a motto build it, ship it, own it for all of our systems. Frontend teams are responsible for their applications end-to-end and as part of this we're also responsible for observing our own systems. We have an external SRA team where we can actually go and ask for help when we need to that can give some guidelines to different teams within the company.
We're using New Relic in order to log runtime production issues and looking at the New Relic dashboard we see this error which doesn't say much to us. Reacting production will be minified and avoid sending down the full error, but when we're visiting the error decoder from React docs the information we get is really valuable but it won't help us identify the root of the issue. Even source maps won't help much to identify the issue down to the root of it, especially when separation of micro frontends might not be that clear like in this case we need to find out ways to better observe them. This brings us to our demo for this talk and I'm gonna share with you right away. So in micro frontends we have terminology about host application and modules. Here our host application is my account and it's the parent application where we can actually load and consume different modules. Here for example we have the header, the footer, and the profile which are specific modules that we expose from an internal system.
In the right part of the screen you might have noticed some buttons. These are the component error and the modal errors and the different modules and the host app have these buttons. If I trigger this model error from the host application you will see that the modal is getting displayed with an error and a specific description. We can dismiss this modal and we can trigger it again from the header or the footer. Profile is a bit more complicated module this is connected to a third-party API. This is loading characters from RICK and MORTY API so we can actually fetch some characters on the fly. Like here you can fetch different characters but what happens when the API fails? So if I click this button, fetch character error, we're gonna get an error from this API and we're gonna display the same modal as before. The way that this application has been set, as you can see here, we have two repositories. One of them is the micro frontend host and the micro frontend module. So we have the micro frontend host for the myaccount host application, and the micro frontend module which is exposing the different modules from the application. In micro frontend host, I'm able to use webpack as a configuration and define through module federation in order to connect to a remote. The remote that we are connected to is the mfe module that exposes from my local host.
4. Using Error Boundaries and Tracking Errors
I'm connected to a remote enter.js file and using a similar webpack config with webpack version 5. We define the file name as remote enter.js, which can be connected from our host application. We expose the source index, which exports the header, profile, footer, and neuromodal modules. If we trigger the component error button, the application crashes. To resolve this, we wrap the header component with an error boundary, which displays an alternative UI and notifies users. We do the same for the app.js file, using a container error boundary. By defining these error boundaries, we can catch and track errors using the new Relic API. We report the errors caused by the header component and the host application, passing metadata such as module name, module version, error type, component source, and user ID. Tracking errors with the API is a good practice, especially when there are issues with external APIs like the Rekordmort API.
I'm connected to this remote enter.js file, and I'm also able to use a similar webpack config with webpack version 5 using the module federation. I'm defining this file name, the remote enter.js, that we're able to connect from our host application, and expose the source index which is nothing more than a file that is exporting the header, profile, footer, and neuromodal modules.
So as you can see, if we go back to our application, we might have noticed that we have some other buttons that are called component error. So here, if I trigger this button, you might have noticed that my whole application has crashed. If I reload my app and try to do the same from my header and click on component error, the application is crashing again. The component error that is triggering an error inside the component from the header module is affecting the whole application and makes the whole application crash. And how we can resolve that? What we can actually do is, aside our header component, the header component is a simple header element defining a title and some buttons, and we can wrap it up with some boundary and export this as default. The error boundary is just another component, and we're passing down the name of the component that this error is happening. So if I open this error boundary, you can see that it's nothing more than a simple React component that is able to derive some state when error actually occurs and is setting the state that there is an error. Based on this, we can actually display an alternative UI and instead of the component itself and notify our users. We can do the same thing for our app.js and we can wrap the app.js with a container error boundary as here. In a similar way, and you can see that the container error boundary is a similar React component that derives some state, defines the state, and we're able to display an alternative UI when an error occurs.
Going back to our application now, if I refresh the app, you can see that now when I'm triggering a component error from the header module, this is not affecting any more the My Account host application, My Account Microfrontend. So we display an alternative UI, notifying us about an error in the Microfrontend module of header, and we're able to reset that, we can trigger again the component error without having the application to crash. We can do the similar thing in My Account host application. But in this case, the whole application will crash because this affects the whole application, the modules that we render inside this host app. So now that we have defined these error boundaries, how we can use them? We can actually when we catch this component, these errors in this component, we can track some errors. We are using the window, the new Relic API. We have we are loading new Relic in our index.html file and we're able to define and track some errors that are happening inside this component. So when we're up the header component with this modular boundary, when error happens inside this header component, we're able to catch this error inside this component. And that way, we're using the new Relic API that is called notice error to report the error that is caused from the header component and pass down some configuration, some metadata, which is going to be some module name and module version as you can see here at the bottom that we're loading from our package station. And some further information like an error type, which is error boundary. Some component source, which is a prop name that we pass from the component that we wrap this boundary with. And also the user ID, which is the user ID for our user in this session. We were able to do the same thing in the container error boundary. As you can see over here, we're actually reporting the same error, not the same error, but the error that's happening in the host application. And we are able to define another boundary and the errors that are happening inside the host application. So what else we can track. As we noticed previously, we use profile in order to fetch some characters from the Rekordmort API. But what happens when there is an issue with this API, we will need, and it's a really good practice to track this kind of errors.
5. Using New Relic for Error Reporting and Automation
We use the New Relic Notice Error API to report errors and pass metadata. We can track errors and actions using NRQL queries in the New Relic dashboard. Automate the process with Terraform and define alert conditions. Trigger errors in the application to receive email notifications. Observability is crucial for debugging runtime production issues. Error boundaries and tracking errors help identify and resolve issues. Automation of alerts improves incident response.
And in a similar way, we're gonna use the New Relic Notice Error API to report this error and pass down some metadata about error config, the module name, the module version, the user ID for our user in this session and their type, which is now a different type of error. It's a sync error. And the component source, which is the profile component.
Similarly, we can do the same thing in the error model, but their model is a bit different. Is related to the errors, but it's not an error itself. So we don't want to track this action that will display their model as an error. That's why we can use a different API from Relic and add a page action, defining our own action name, our specific action name that we can define and pass down some metadata that are similar to the errors that we were talking before, like the module name and the module version, the component source and the user ID.
And you might be wondering how we can actually use this data. If we go to the New Relic dashboard, New Relic offers NRQL, which is a query language specifically for New Relic. This seems like MySQL, and you can actually use it in order to query some data from New Relic. So here we're able to select some data, all the data from the JavaScript error table, which is the table where all the errors from our application that we previously tracked are available, and we can define the app name that is Microfrontend Apps and the module name, Microfrontend Host, and find out the errors that are related to this specific criteria. We can be much more specific and even define the error type, can be error boundary and the component source to be footer. And similarly, we can do also the same thing with the page action and define the action name that is error model display or the module name that is coming from the Microfrontend module. We can also count the number of errors and we can get the amount of errors that are happening or the amount of actions that we can see over here, but how can we further automate this process since we don't want to go to New Relic Dashboard and check all the time these queries.
What we can do is we can rely on Terraform. In the zone we are actually heavily invested in Terraform and as you can see here, Terraform helps us define some infrastructure with YAML configuration file and we are able to define a provider of New Relic inside Terraform and define some New Relic Alert policy that is connected to a specific New Relic Alert channel which is going to be the team's email where the team is going to get notified when an alert is getting triggered and most importantly, we are able to define some New Relic Alert conditions that is nothing more than the NRQL queries that we've seen before in our dashboard. In that way, we are able to select the number of errors that are happening from the JavaScript error table and define that the app name is Microfronted apps and the module name is Microfronted module. That way, when we have more than two errors that are happening for the next 60 seconds, we will get notified and get an alert condition. We can define another alert condition similarly for the page action as you can see over here with the NRQL query and define the app name to be Microfronted apps and the action name to be error model displayed for our error models and the module name that we want to track for is the Microfronted module and again, if we have more than two error models for the next 60 seconds, we're going to get notified.
So if we go back to our application and we try to trigger some errors and create some traffic to New Relic. After a while, we will be able to see some notification so we will get an email from New Relic that an incident was raised about this specific issue and we're gonna get some links to the dashboard of New Relic to check more details and also the query that has triggered this incident. You can also connect New Relic to your own on-support system like for example PagerDuty which is really helpful if you have on-support teams or anything like that. And yeah, that's pretty much it. Some summary. So, from what we've seen, observability is hard. We really need to foresee the unknown unknowns and we need to be able to track enough data that would help us debug a runtime mission production. We started by defining some error boundaries and cast the errors in these boundaries and then we tracked the errors in fail-prone related parts of our application which is the module boundaries async errors like when we fetch some data from an API and the error modals that are related to these kinds of errors. And finally, we also automated some alerts for some production incidents in the runtime. That was all That is my twitter handle so feel free to reach me out if you have any questions or if you want to share some feedback and thank you very much. Let's have a look at the results.
6. Importance of Observability in Debugging
Finding an issue usually takes more than an hour. It's expected to spend more than an hour debugging such issues. However, observability can help minimize the time it takes to understand and resolve production issues, as well as understand the client experience.
So, well, we have an obvious winner, more than one hour. I already thought like, yeah, finding an issue usually, yeah, yeah, I mean, it takes more than an hour, it was quite expected to be honest. I would I wouldn't expect to be able to find an issue less less than an hour. I mean, I can see there are a few a few answers. I mean, zero minutes. I mean, of course, that's not even possible. Explain, I don't know if you're like a superhero or something like that. But yeah, yeah, yeah, definitely. And and yeah, otherwise, I believe that in most of the cases, one you need more than one hour to debug such issues. But yeah, it can definitely help us during that. So can definitely help minimize the time that it takes us to realize and understand what is the production issue, the runtime issue. And most importantly, understand what our clients behaving and what is the experience that they're actually getting. Yeah, yeah, yeah, I agree.
Dealing with Errors in MPM Packages
How do you deal with errors that are part of MPM packages? If you have an error in production that affects Windows users only, but you're unable to replicate it, it can be challenging to find the root cause. Wrapping the specific library with a try-catch block and logging the error to a service like Sentry or New Relic can help. However, finding bugs in external packages and dealing with replication issues can still be difficult.
So let's jump into the audience questions. The first one is from Mikluho. And it's a long one. How do you deal with errors that are part of MPM packages? For example, if you have an error in production that is caught by Sentry that affects Windows users only, but I'm not able to replicate this. I hope it's an issue for the MPM package owners. But they mentioned from for more details that they couldn't replicate it either. So yes, that's a really good question. Because so usually when you're using an MPM package, this would be imported in your file system, in your codebase. So in that case, it will trigger a stack trace in your application. So when you're actually going to debug the error log, you're going to find out the stack trace. And yeah, this is going to be really hard to find out, especially if this is coming from your very in the root of your system. So for example, if it is like in the build system, Webpack, Rollup or any other of these tools, it's going to be quite hard to find out what the issue might be. But yeah, otherwise, you can actually create some... If you are actually curious about or you actually suspect the specific library, what you can do is you can wrap this library with a try cut with a, in a way. So you're able to to catch this error and log it to your service. Like for example, in this entry or the new relic or any other tool that you might be using. So, yeah, in this case, it's definitely hard. So you need to go a bit with the understanding or with your feeling, let's say, with your intuition. But yeah, this is going to be this is going to be really hard. Yeah. Yeah, that's a hard problem to solve. Finding bugs is hard, but then even if they come from external packages and replication, of course, is a big issue. So I feel your pain. Yeah.
This is not a question, but a comment from Ante. Yeah, and they're talking about you. Of course, he killed it with this session. Really great job and useful info. And then one, two, three, four, five, six, clap emojis, so you can take that home. So next question is from Iskkk.
Catch Errors and Implementing Boundaries
OpenTelemetry is an open source tool that can be used to catch errors from the client. It offers freemium versions, just like New Relic and Sentry. New Relic, in particular, provides a powerful query language that allows you to analyze errors in the runtime production environment and identify their source. Moving on to the next question, the approach discussed can be implemented with other JavaScript frameworks like Angular and Vue. While the implementation of error boundaries may differ across frameworks, it is possible to implement similar boundaries in all of them. That concludes the Q&A session. If you have more questions, feel free to join me in the speaker room. Thank you all for your time and see you next week!
OK. Any other tools you recommend to catch from errors from the client then Senti or New Relic, like any open source tools that are available? Yeah, I have heard of, I think it is called OpenTelemetry. I haven't used it to be honest, but this is something like a tool that I've been looking to try out in some of my side projects or my personal project. Yeah, because this is, I think OpenTelemetry is open source, but even these tools like New Relic and Sentry, they offer some freemium versions. So for example, you can do up to specific users. I don't know the specific plans, but you can actually use them for free, especially what I've seen with New Relic is really powerful because you are actually able to use the query language that it offers. So every errors that you're tracking to the application you can actually query against the multidimensional data and you are able to find out and limit down the errors that are coming from the runtime production environment and specify where these issues might be coming from or for example, what are these issues with specific users this might be affecting or anything like that. That is really powerful stuff, yeah.
Thanks. Another compliment for you that it was a really great session. So we can make a whole wall with compliments for constant things. Really happy that people liked the talk.
And then next question. We don't have a lot of time. So quick question, a cool demo and learned a lot. Can I implement the same approach with other JavaScript frameworks like Angular or Vue? Oh yeah, definitely. I mean, if you check so the async errors can actually be applied in every language as long as this JavaScript you can actually catch these errors. Or for example, the error models can also be triggered. You can trigger these errors or actually these actions along with your UI. The most difficult part would be the error boundaries. It's language. It's framework. Like for example, Angular, React, Vue.js have their own way of implementing the boundaries. But most importantly, you're able to implement these kinds of boundaries in all of these frameworks.
All right. Well, good to know. Well, as I mentioned, that's all the time we have for Q&A. But you are going to be in your speaker room. So if people want to ask more questions, they can do so there, right? Yeah, yeah. I'll be waiting you there. Thank you very much for having me here. It's been a pleasure. And I'll see you next week at our next meeting. Thank you, guys. Bye bye.
Comments