Video Summary and Transcription
WebAssembly is a cost-effective way to distribute computation and allows for code reuse and performance optimization. It can be used for running bioinformatics tools in the browser without setup, but running it on the server or smaller devices may have limitations. WebAssembly is best suited for playgrounds, small-scale simulations, audio and video processing, and upload pre-processing. It offers few benefits outside the browser for server-side applications, but can be useful for running user-provided code and serverless functions.
1. Using WebAssembly for Command Line Tutorials
I'm Robert, a co-founder of OM Genomics Labs. We build software tools for genomic scientists. Bioinformatics tutorials often have challenges with environment setup and data loss. Existing solutions like VMs and cloud infrastructure are time-consuming. Sandbox Bio is a free tool that allows for tutorials without setup. It uses containers and WebAssembly for cost-effectiveness.
Thanks, everyone, for being here. I'm really excited to share with you some of my experience using WebAssembly for powering command line tutorials. So I'm Robert, and I'm a co-founder of OM Genomics Labs. So we build software tools for genomic scientists.
So one of the things that we found problematic in bioinformatics education is that whether you're doing tutorials are text-based or videos or even in person, there's a few challenges that are relatively specific to bioinformatics. And one is, it's really hard to set up your environment. And this is especially true in bioinformatics, where a lot of the tools have a lot of weird dependencies. Some of them might not even give you binaries to download, and you have to compile everything from source, which is kind of interesting. And so it's really hard for people to get started, but it's also really scary. If you're defining a variable in a tutorial, and then you want to go and delete that variable and you make a mistake, then you end up deleting all your data, and it's not reversible.
And that's a problem because exploration is typically how most people learn, is by deviating from the tutorial and trying different values for different parameters and seeing where that gets them. And in my opinion, a lot of the existing solutions are not working. And so typically what is done is, let's say at the beginning of a workshop, you might spend some time showing students how to install dependencies on their local machine or spin up VMs in order to install the dependencies in a clean environment. And the problem is that you spend a lot of time either in a workshop or out before the workshop doing setup. And especially for scientists, who, you know, all they want to do is use the command line tools. They don't want to learn how to spin up VMs, set up cloud infrastructure. It's kind of a lot of time to spend on these things.
And this is where Sandbox Bio comes in. So this is a free tool that shows bioinformatics tutorials. And here I'm showing a non-bioinformatics tutorial just so you can see something familiar. So on the left there's the tutorial content and on the right there's an embedded command line interface. And you can type your commands in there and they will execute and there's no setup required, no installation. It just all happens in the browser. So how would you implement something like this? The way most such tutorial websites go is containers or some similar format where a new user comes to the site, you spin up a new small container, you put limits on it, and you know they can send arbitrary commands there. You execute them, show them the result, and then once they haven't been active for a while you turn off the container. And that works. Problem is it's very expensive, especially if you want to make a free tutorial website. This almost never works. A lot of the websites that have used this approach in the past either just slowly limit their free tier more and more or they kind of stop offering the free tier entirely. And that's where WebAssembly is a really interesting solution.
2. WebAssembly Overview and Compiling Code
If you have some tools like awk, retin and C, you can compile them to WebAssembly and run them in the browser directly. It's a cost-effective way to distribute computation. WebAssembly is the fourth language that the browser can support, allowing you to take existing code written in languages like C, C++, and Rust and compile it to WebAssembly for the browser. It has been used for code reuse and performance optimization. People are excited about its portability and there are tools like Emscripten that make compilation to WebAssembly easier.
The idea there is if you have some tools like awk, retin and C, you can compile them to WebAssembly and run them in the browser directly. So instead of centralizing all the computation on your cloud servers, you're kind of distributing them so that each individual using the site runs the computation in their browser. And so that's a lot more cost-effective way to do things because all you're doing now is sending users JavaScript and WebAssembly assets.
WebAssembly has been around for some time since 2017, but it still can be a little confusing to a lot of developers. I did want to spend a bit of time giving a quick overview of WebAssembly and what it is and why it's useful. And so I like to think of WebAssembly as the fourth language that the browser can support. You can do HTML, CSS, JavaScript, and now you have this fourth option, WebAssembly. Although I say it's a language, it's kind of a weird looking language. So this is an example, hello world example, and it looks awful. But thankfully, you don't have to write this by hand.
The beauty is that this lets you take existing C, C++, Rust, and other languages, take existing code and compile it down to WebAssembly for the browser. This has been used primarily for reusing existing code on the web. These are all great examples of tools that are millions of lines of C that the authors did not want to have to rewrite in JavaScript from scratch in order to run in the browser. Another way I've seen people use WebAssembly is for performance reasons. There are some cases in which you can take some slow JavaScript portion of your app and replace it with some compiled optimized WebAssembly. People are also really excited about the portability of WebAssembly. So this idea that you can run it, let's say on a serverless function provider or in your node backend. But there's also this idea that you can run it on the server and on smaller devices. And while that is possible, whether you do want to do that or not is another question. I also wanted to get a bit more practical here about what compiling something to WebAssembly looks like, because there's a lot of talk about WebAssembly, but I feel like once you see what it looks like, you get a better sense for what it means.
If you have a C or C++ program, typically these are the kinds of tools that you're using to compile the code to a binary program. And so there's this suite of tools called Emscripten, and that helps you make this compilation to WebAssembly a lot easier than it would be otherwise and gives you a lot of really powerful tools. And so they give you wrappers around these tools so that you can compile things to WebAssembly.
3. Running WebAssembly and Compiling Code
You can run WebAssembly on the server and smaller devices, but whether you should do it is another question. Compiling code to WebAssembly using tools like Emscripten makes it easier and provides powerful capabilities.
But there's also this idea that you can run it, you know, on the server and using and on smaller devices. And while that is possible, whether you do want to do that or not is another thing that or not is another question. And we'll talk about that a little more towards the end.
I also, I wanted to get a bit more practical here about what compiling something to WebAssembly looks like, because there's a lot of talk about WebAssembly, but I feel like once you see what it looks like, you get a better sense for what it means. And so if you have a C or C++ program, typically these are the kinds of tools that you're using to compile the code to a binary program. And so there's this suite of tools called Emscripten, and that helps you make this compilation to WebAssembly a lot easier than it would be otherwise and gives you a lot of really powerful tools. And so they give you wrappers around these tools so that you can compile things to WebAssembly.
4. Compiling to WebAssembly and Using in Browser
Here's a simple bioinformatics tool written in C. Compiling to WebAssembly with Emscripten's C compiler generates a dot JavaScript file that manages the WebAssembly file. Running WebAssembly isn't always easy, as it requires tweaking and modifying code. The browser-based terminal in sandbox.bio uses WebAssembly and xtermjs to simulate a command line. Storing the file system state in indexedb is easy compared to using containers.
So here's a simple bioinformatics tool written in C. If you compile it to binary, you can use GCC and call it a day. If you want to compile it to WebAssembly, you can use Emscripten's C compiler. And this will give you the ability to output a dot JavaScript file, which will contain some glue code to help you manage the dot WebAssembly file that it also generates.
And so how would you run these things? Well, this one on the binary side, you run it on the command line with the parameters you want. And on the WebAssembly side, Emscripten gives you some utilities to, for example, call the main function with these same parameters as on the command line. But the problem with WebAssembly is that it's not always that easy. So this was a very simple example, but let's say you want to compile grep to WebAssembly, then it looks a bit like a disaster. And none of this, like these modifications and flags I had to tweak, is not really intuitive. It's just that you're trying to compile something that was not made for WebAssembly. And so there's a lot of things that WebAssembly doesn't support that you have to kind of tweak or get rid of in the original code. And that's what it ends up looking like.
Speaking of which, because I kept running into this issue, I ended up writing a book about it, describing my experience with it and how I usually approach compiling things to WebAssembly. And so back to sandbox.bio. Sandbox.bio uses WebAssembly in order to provide this terminal in the browser. And the way this works is you have this terminal that is powered by xtermjs, so it simulates a command line. But of course it doesn't do any evaluation, so you have to take the user input and... Sorry. So you have to take the user input, insert it into an abstract syntax tree, and then decide the order in which you apply the commands. So in this case, you have a pipe. So this will execute awk and send the result to head, and therefore giving you the output.
The really nice thing about running everything in the browser is that it's really easy to store the state of the file system, at least temporarily, in indexedb. And that would be a lot harder if you were using the container approach, where every user you spin up a container. When you shut it down, you would have to take a snapshot of the state and save it on your cloud bucket, for example. So it would be even more expensive for you to provide that. Whereas in the browser, it's very easy to do. And so what I just described, that's sandbox.bio.v1, where every tool that I wanted to support in the tutorial was compiled to WebAssembly. But I didn't want to compile Bash to WebAssembly, because that was really heavy. So I ended up compiling some individual tools, but I had to do some simulation. How do I handle piping? Or running jobs in the background? Or variables? Things like that had to be simulated.
5. Running Debian in Browser with v86
Sandbox.bio.v2 runs a whole Debian operating system in the browser using v86, a Rust-based CPU emulator compiled to WebAssembly. Limitations include a maximum of four gigabytes of RAM and support for only 32-bit architecture. Tutorials must use small files to ensure efficient performance.
So it didn't support all the Bash niceties. So sandbox.bio.v2, which we released earlier this year, takes a little bit of a different approach. It runs a whole Debian operating system in the browser. And it does that using a tool called v86. It's essentially a CPU emulator, written in Rust, and that gets compiled to WebAssembly. So now you're not compiling the individual tools to WebAssembly, you're compiling the emulator. And then on top of that, what we do is, you know, we define a Dockerfile that has all the bioinformatics tools that we want. And then we can use v86 to kind of have those tools available in the browser. So in terms of limitations, because you're running this in the browser, you can only use four gigabytes of RAM. And this is a WebAssembly limitation. So there is some work underway to have a 64-bit version of WebAssembly, which will have a lot more RAM capabilities. But that's not too big of an issue for tutorial websites, where typically you're not running real analyses. Another big limitation is that v86 currently only supports a 32-bit architecture. So there were a few tools that require 64-bit architecture that we couldn't get in the sandbox, but most of the tools we were able to. So that was a pretty big improvement for v2, where we didn't have to spend so much time trying to figure out how to compile every tool to WebAssembly. And the last limitation is that the tutorials have to use small files. And this is because, you know, you're using WebAssembly, and then on top of that there's an emulator. There's a lot of layers over here. And so by using small files, you make sure that it's not too slow to run the Linux in the browser. And that's also okay for tutorial websites, because the goal is really to show how you would do something. It's not to run an analysis on your real data.
6. Lessons Learned and Use Cases for WebAssembly
When using WebAssembly, it's important to consider the amount of computation involved. Too little computation may result in impractical usage, while too much computation may require running tasks on the cloud. The sweet spot for WebAssembly lies in playgrounds, small-scale simulations, audio and video processing, and upload pre-processing. It's best suited for reusing existing code and bringing it to the web, rather than replacing containers. WebAssembly is a niche technology that has its specific use cases.
Okay, what are some of the lessons learned? So I've been using WebAssembly since 2018 for sandbox.bio, but also for other tools and applications where WebAssembly was really useful. I wanted to talk about the opposite side of it, which is when I wouldn't use WebAssembly. And the first place is if you're doing too little or too much computation. I know that that sounds kind of meaningless, but let's talk about what those look like.
So too little computation. The best example for that is if you're using a library that lets you design your front-end UI in Rust, and it gets compiled to WebAssembly. Look, I get it. That's very cool, but it's not really practical. You end up embedding a lot of HTML and weird-looking JavaScript into this, and it's not clear to me what the benefit is. Your application will definitely not be faster. I mean, if your UI is that slow, you kind of need to rethink what you're doing. And what you might want to do is look at the slow parts only and replace those with WebAssembly instead of using WebAssembly for the whole application, which makes less sense. Because there's also going to be a lot of overhead, especially mental overhead, as you're trying to figure out how you're compiling this stuff to WebAssembly. And also, probably other people on your team who use JavaScript don't know how to use Rust. That's an extra amount of overhead that I don't think is really worth it. But it is cool, so I'll give you that.
On the other end of the spectrum is too much computation. So if you're using more than four gigs of RAM, for example, or if you need something that needs to run at 100% CPU for minutes or even tens of seconds, you may want to rethink whether that's the right user experience that you want to provide your user. And maybe it just makes sense to run this task on the cloud.
On the other hand, what I've found to be the sweet spot is when you do things like playgrounds, like Sandbox Bio does, small-scale simulations, audio and video processing, and upload pre-processing. And this is one that I've used several times, where if you ask users to upload very large datasets, while the data is being uploaded, you can run some sanity checks on the data. And you can do this in JavaScript too, but in my case, a lot of the tools for doing this sanity check was written in C for bioinformatics, and so it basically brought it to the web with WebAssembly. But so in general, in my opinion, WebAssembly is really best for reusing existing code that isn't written in JavaScript and you want to bring it to the web, or just sprinkling it in for a little bit of improvements here and there. That's when I found it to be most powerful. And it's also worth saying too, WebAssembly is not the kind of tool that every developer should be using. It only makes sense in a few subset of cases, and if it doesn't make sense for your use case, I would just say don't bother. It's a niche technology, that's for sure. Another reason I wouldn't use WebAssembly is trying to replace containers. And if you look at what is happening in the world of WebAssembly right now, you'll see a lot of excitement about using WebAssembly outside the browser. In some cases, some people will say that WebAssembly is going to destroy Docker containers, which honestly I'm a bit confused by because I'm not sure what that means.
7. WebAssembly for Server-side Applications
The benefits of using Docker containers are the ability to have a small reproducible package with multiple tools. However, using WebAssembly for this purpose is complex and offers few benefits outside the browser. On the server, you have more language options and deployment choices, making WebAssembly less necessary. It can be useful for running arbitrary user-provided code and in serverless functions. For more insights on WebAssembly, check out the ChangeLog podcast, episode 570. Visit levelupwasm.com for my book and free articles and videos.
To me, the benefit of a Docker container is that you're able to have a small reproducible package where you can add a bunch of disparate tools in one place. With WebAssembly, that's really not easy. I mean, you saw how complicated compiling grep to WebAssembly is. I can't imagine compiling, I don't know, Nginx or Postgres. I'm sure that's doable, but the question is always why? What is the benefit of doing that? And to me, there's very few benefits to using WebAssembly outside the browser, and let me tell you why.
When you're in the browser, you don't really have a choice, right? You can use JavaScript or you can use WebAssembly. That's all the choices you have. That's not true on the server. You can use whatever language you want on the server, unlike the browser. You can run it in bare metal in a VM. You can run it on serverless functions. You can run it at Kubernetes cluster. You can run it on your Mac book if you really wanted to. The point is that you have so many options on the server that the question then becomes, why would WebAssembly be the right solution? And so I think in most cases, it probably isn't. There are a few subset of cases where it might make sense. So if you're trying to run arbitrary code that the user provides you, for example, if you have a plugin architecture where users can give you JavaScript to extend your application on the backend, WebAssembly is great for that because it's a sandbox. It's very limited with what you can do in there. And so I think that might make sense. But that's probably not what most developers need to do anyway, but that's a good application. And also I found them useful in serverless functions. If you want to run code in a language that isn't yet supported, for example, then that might be a good application there.
If you want to hear more of my thoughts on WebAssembly on the server, you can check out the ChangeLog podcast, episode 570. All right, that's all I had today. If you found this useful, you can check out my book at levelupwasm.com. I like to take a practical approach to WebAssembly, so I'll tell you when it's useful and when you should stay away from it. And if you're interested, I have a whole bunch of free articles and videos from past talks that you can also look at. All right, thank you very much.
Comments