Video Summary and Transcription
The Talk discusses the development of the Jam browser extension, which is a bug reporting tool. It explores the challenges of messaging between different execution environments within a browser and the need for message chunking to overcome size constraints. The Talk also explains how the development team rebuilt the system using a TCP/IP network stack approach, which allowed them to solve messaging difficulties similar to networking problems. The benefits of this approach include a smoother rollout, simpler debugging, and a focus on feature development without worrying about messaging constraints.
1. Introduction to Jam Browser Extension
Hi, I'm Cyrus, an engineer at Jam. Today, I'll talk about building a network stack for our browser extension called Jam. Jam is a bug reporting extension that helps you receive detailed bug reports with features like screenshots, network requests, and console logs. It saves time and provides a hassle-free experience. The extension consists of components like background script, pop-up, content script, and host script, all operating within a browser window.
Hi, I'm Cyrus. I'm an engineer at Jam, and this talk is about building a network stack for our browser extension.
Now, if you don't know what a network stack is, or if you do know what a network stack is, and it just sounds weird that we're building one in at the application layer inside a browser extension, don't worry, we'll answer all those questions in this talk.
So first, what are we building? What is Jam? Jam is a browser extension for bug reporting. And what that means is we help you receive perfect bug reports every time. Typically, when you receive a bug report from a product manager or a QA, it's not a fun experience because you don't get a lot of details on the bug. You might get a sentence or two of like, this page is broken, and you have to decipher what that means. With Jam, whenever your product manager files a bug, they have to select a screenshot or record the tab or an instant replay. So you can actually visually see what the bug is. And then we automatically bundle in a bunch of other information like the console logs, network requests, page metadata, repro steps. So like, whenever a user clicks a button on the device info timestamp, bunch of other details that you would want to get a picture of what actually happened. When you receive this bug, you can open up the bug, and it's kind of like having the DevTools in your browser open, as if you had the bug right there on your laptop. And you can look at the headers, the request body, the response body, the console logs, whatever. So basically, we save you a bunch of time on back and forth. You don't have to jump on a call. And it's just a headache free experience.
So in order to do this, we have a browser extension. And this browser extension has a few components inside it. The main component is a background script. And a background script is kind of like a server for a browser extension. Like it's running in the background, but it's local. There are a few other ephemeral components, like the pop-up, the content script, the host script. These all live inside a browser window. And so a browser window is just like, you open up a Chrome window. A Chrome window can have multiple tabs. And it can have a pop-up too. So the pop-up is like, whenever you click on an extension icon, there's this pop-up window. You can interact with it, and then it closes. And then a tab is like whenever you go to Hacker News or Google, or something like that.
2. Execution Environments and Messaging
Inside a tab, there are two execution environments: content script and host script. They communicate differently and have specific constraints. Messaging APIs like window.postMessage and Chrome messaging API are used. Some APIs have size limitations. The host script can only communicate with the content script, requiring message forwarding. Multiple instances of components can pose challenges in addressing messages to the correct content script.
And then a tab is like whenever you go to Hacker News or Google, or something like that. And inside this tab, there are two components, or really two execution environments. One of those is the content script execution environment. And one of those is the host script execution environment. The content script execution environment is like a custom environment where it's isolated from the page itself. So if the page that you're on modifies a window property, the content script doesn't get affected by it. And this is basically for extensions to inject code inside this content script execution environment and not have their scripts get modified by the host page.
And then the host script execution environment is just where the host page's scripts live. So all of these components of this browser extension communicate, but they communicate in slightly different ways. So all these components communicate bidirectionally. So from the pop-up to the background script and vice versa, you can send messages from the content script and the background script and vice versa. But the host script is the one exception. And the host script can only communicate with a content script. So if you want to send a message from the background script to the host script and get a response, you have to forward that through the content script. You have to basically set up a handler that will proxy that message to the host script and then proxy the host script's message back to the content script.
And this is kind of difficult to deal with when you're just trying to create a feature within your browser extension. Like, you just want to build something, and then now you have to think about all these different constraints that you have to think about with sending messages back and forth to power that feature. The constraints are basically these listed right here. We have different messaging APIs that we have to use for all of these components. So between the content script and the host script, we might be using window.postMessage. But then between the content script and the background, we might be using a Chrome messaging API. And that's a different API than between the pop-up and the background script. So there's a few different APIs that we're using here, and we have to keep that in mind. Additionally, some of those APIs have size constraints. So the size of a message is limited. And then, just like before, some of these components can't communicate directly. So when you're dealing with the host script, you have to proxy messages back and forth to it unless you're just messaging from the content script. And a lot of these components can... You can have multiple instances of them, so you can have multiple tabs, you can have multiple pop-ups. And that can present some problems when you're trying to address messages to the right tabs content script.
3. Troubleshooting and Preview Challenges
Messages can be dropped if the pop-up or tab is closed. Messages can also be dropped if a tab's thread is overloaded or if messages are too large. Incognito mode support is crucial for Jam as a bug reporting extension. The user flow for reporting a bug involves selecting a report type, previewing the report, and filing it with an issue tracker. The preview of large instant replays posed challenges due to message size constraints, requiring a hack to share data between the background and content script.
You can also drop messages, or messages can be dropped, rather. So if the pop-up is closed, it's not going to receive that message. Or if you have a tab open, but it gets closed while it's receiving or while it's responding, that message is effectively dropped. Or if a tab's thread is just processing so much because JavaScript is single-threaded, that it can't pull queued messages from the message queue, and it can't respond to them before they time out. So those messages are effectively dropped.
And then this last point here, memory sharing. There's a hack that we can use to message back and forth if a message is too large to fit inside a single... The data inside a message is too large to fit inside a single message. We'll go over this later, in a second.
But that's all to say that whenever we started building incognito, we had some trouble. So incognito mode support for Jam is pretty important because we're bug reporting extension, and... QAs like to use incognito mode whenever they're running through certain flows. It's pretty important. So we wanted to get this working, but we had some trouble. And I'll explain how. This is basically the general user flow of what a bug reporter goes through as they report a bug with Jam. So first, they might open up the pop-up and select a bug report type. This might be from earlier, like a screenshot, a tab, recording, or an instant replay, or a desktop recording. And once they select that type, like say they select an instant replay, then we'll show them a preview of that instant replay on the tab that they're on. So they select it, and it instantly appears. And they can decide to trim this instant replay. They can see some of the errors that popped up during their time, like some network request errors or some console log errors. They can file this with linear, or whatever issue tracker they want, and include a title and select the right team, and then they create the issue. That sounds simple enough. But this preview part was giving us some trouble because preview data for an instant replay is held inside the background. And whenever we want to display it on the page, we have to send that data from the background to the content script. But instant replays can be quite large. And they typically wouldn't fit inside a single message, just because of the size constraints from the messaging APIs that we'd use to send from the background to the content script. So we would use this hack from before to share data from the background to the content script. And this hack is basically like, you create an object URL on one end in the background, and you send that URL, just like a string, like a URL.
4. Message Chunking and TCP IP Network Stack
Fetching data on the content script side allows processes to share memory, solving size constraints on messaging APIs. Implementing incognito mode requires message chunking to overcome object URL context differences. Borrowing from the TCP IP network stack, we rebuilt the system to address messaging difficulties, similar to networking problems such as different link layer protocols, message size constraints, and multiple independent devices.
And then you fetch it on the content script side. And you fetch it just like it's a regular HTTP request, except it's all within your browser, and it basically allows the processes to share an object in memory, essentially. Which is really nice for circumventing issues with size constraints on messaging APIs. But when we were implementing incognito, we realized that this was a problem, because incognito tabs have different object URL contexts. So within your background script and your non-incognito tabs, you have one context, and within your incognito tabs, you have another context. So if you create an object URL from your background and send it to your incognito tab, your incognito tab would try and fetch it and just wouldn't get anything, basically 404.
So we had to figure out how to implement message chunking. And basically take this large message, which had our instant replay data inside, and just chunk it up and send it over a series of messages, and then rebuild it on the other side. The problem was, since we had this system that wasn't really built with message chunking in mind, we had to take a step back and rebuild it with that in mind. Luckily, we had one really nice convenience here, which is that the messaging difficulties that we're solving for have already been solved for. They're general networking problems, actually.
So you might recognize some familiarities here where having different underlying messaging APIs also happens within the TCP IP stack. You'll have different link layer protocols, like ethernet versus Wi-Fi. Message size constraints also happen where you have a Wi-Fi and ethernet packet size limit. Components not being able to communicate directly. It's like routing from your computer to Google.com is definitely going to require some hops to forward those packets. Multiple tabs and popups being able to exist independently. It's like having multiple devices on the internet. You have to address all of them independently. You can't broadcast. Messages being dropped is just like packet loss. Memory sharing not being allowed is just like having separate machines where you can't cheat. You have to serialize your data. You can't just instantly transfer it. This is really nice because the abstraction that we borrowed from was basically just the TCP IP network stack.
5. Network Stack and TCP IP Layers
A network stack is a layering of protocols that solve specific problems. Each layer builds on guarantees offered by the previous layers, allowing developers to focus on solving end-user problems without worrying about lower-level details. In the TCP IP stack, the link layer handles messaging between locally networked devices, the internet layer handles routing and forwarding packets using a routing table, and the application layer provides developer-friendly tools such as HTTP methods and encryption.
What a network stack is is it's basically like a layering of these different protocols. So each protocol layer is solving a specific problem. So if you want to solve addressing, for example, you might look at the network layer, which is where we think about IP addresses. If you want to think about packet reliability, like making sure that your stream is intact, like you want to send a TCP stream, that would happen at the transport layer. So each layer basically will build on guarantees offered by the previous layers.
So by the time you get all the way up to the application layer and you're making HTTP requests, your HTTP request doesn't have to think about the physical layer. So if there's a user using Ethernet or a user using Wi-Fi, you just don't care about that, which is really nice, because you're solving end-user problems. You're building features, and you don't have to think about these things. We wanted to do that too, so we didn't have to think about messaging difficulties as we were building features within our extension.
So within the TCP IP stack, what this looks like is at the bottom, you have the link layer, which has Ethernet and Wi-Fi, it's just like messaging between locally networked devices. Above that, at the internet layer, you have IP packets, basically, where the internet layer handles routing and forwarding packets, and it makes use of a routing table. A routing table is basically a quick-lookup table to determine where to route the IP packets based on their address. So this is useful because you might not be directly connected to every machine that you're trying to route to, so a routing table helps you determine what's the next best hop to send it to. And that routing table can be powered by different things, but at a very high level, it's powered by Border Gateway Protocol, or BGP, which is a dynamic routing protocol to determine the cheapest pathway between autonomous systems, which are basically ISPs. And that all feeds into addressing that internet layer.
6. Transport and Application Layers
The transport layer provides reliability features like preventing double-received packets, automatically retrying drop packets, and handling streaming. The application layer offers developer-friendly tools such as HTTP methods, headers, cookies, encryption, and user-readable addressing. The link layer handles picking the correct messaging API supplied by the browser, while the packet layer handles routing via a simplified routing table. The datagram layer implements chunking for incognito mode, and the application layer utilizes a broker-style pattern for message sending and receiving. These layers address various problems encountered during the project, resulting in significant benefits.
Above that layer sits a transport layer, so this is like UDP or TCP are examples of protocol implementations of this layer, and these provide things like reliability, like preventing double-received packets, automatically retrying drop packets, or handling streaming, like TCP does.
And then above the transport layer sits the application layer, which we all know and love, and it's just really easy to work with as a developer. It gives us a bunch of nice developer, end-user developer tools like giving HTTP methods, or headers, or cookies, or encryption, or user-readable addressing, which is like when DNS translates a website URL to the IP address that you're trying to target. And we decided to borrow all of this for our stack, which looks like this.
It's not very creatively named. It's pretty much just the same layer names. But at the bottom, we have the link layer, which, just like the link layer at the TCP IP stack, our link layer handles picking the correct messaging API supplied by the browser. So one example is like Chrome Runtime.sendMessage. Another example is window.postMessage. We just kind of pick what is the right messaging API based on the source and the destination.
The layer above that is the packet layer, which, just like the internet layer and IP packets, we handle routing at this layer via our routing table. Our routing table is much simpler because the structure of our browser extension is pretty well-defined. You don't have to have an algorithm like BGP to determine it. So our routing table is hard-coded. Above the packet layer sits the datagram layer, which is what we ended up implementing the chunking that we needed to for incognito at this layer. We still made use of object URL fetching, but we basically determine which protocol to use at this layer, like whether we need chunking or whether we can use object URLs, and by whether the sender and receiver are both in the same incognito context.
And then above the datagram layer is the application layer, which is where we implemented a broker-style pattern, which you can use for sending and receiving messages, and it's very convenient. You can think of it kind of like HTTP. You don't have to think about any of these other layers whenever you're working on the application layer. This basically solved all our problems. So here's our problems and here's our solutions. For different underlying messaging APIs, we abstracted this with the link layer, which handles those. With size constraints for messages, we handled this by chunking at the datagram layer. Components not being able to communicate directly, we handled this by packet-forwarding at the packet layer, so we can hop over components as we want to reach the host script. Having multiple tabs or pop-ups, this was handled with independent addressing at the packet layer too. Messages being dropped, we handled this via packet retries at the datagram layer, and memory sharing not being allowed. This is that dynamic switching between chunking versus another protocol at the datagram layer based on if the sender or receiver is incognito. And that's pretty much it. Looking back at this project, this was somewhere around eight months ago. We had some big benefits from this.
7. Smooth Rollout and Developer Focus
We maintained the high-level interface of our previous messaging system while restructuring the underlying layers, resulting in a smooth rollout. Debugging became simpler with the separation of concerns at each layer and the ability to write tests. Our main goal was to let developers focus on building features without worrying about message link constraints or forwarding messages. This abstraction was achieved through our network stack.
Our previous system that we used for messaging had a nice interface, and we just kept that high-level interface while restructuring everything underneath it. That gave us a pretty smooth rollout. We didn't have to change any of our application code. We just created this networking stack and then replaced our old messaging system with this, but they had the same end interface, so that rollout was really simple.
We had simpler debugging too here, because each layer has its own thing that it focuses on, so we have the separation of concerns that lets us narrow down where issues are happening, and we can write tests at every layer. Basically, if something goes wrong, we'll find out as our CI runs our test suite.
Really, our main goal with this, aside from building incognito support, was to let our developers focus on building features. And before this, we had to really think, oh, do I need to implement a forwarding handler at the content script layer if I want to send a message to the host script? Now this is all programmatic basically. When you're building a new feature in the Jam extension, our developers don't have to think about message link constraints. They don't have to think about forwarding these messages over the content script. All these problems basically get abstracted by our network stack.
That's it. I hope you enjoyed this talk.
Comments