Going Live from a Browser...with Another Browser

Rate this content
Bookmark

There are other ways to proxy live video from a browser to an RTMP endpoint, but what if we wanted to interact with that stream first? And not just interact writing obtuse ffmpeg filters, but just some good ol' HTML and CSS? Let's do that! We'll talk about how you can allow your streamers to go live directly from their browser using headless Chrome and ffmpeg.

This talk has been presented at JSNation Live 2020, check out the latest edition of this JavaScript Conference.

FAQ

Mux offers online video infrastructure services for developers, including an API for live broadcasting.

Live chat involves direct communication between browsers with low latency, suitable for synchronous talking. Live broadcast is a one-to-many communication where one input stream is broadcasted to many viewers, typically with higher latency and without the expectation of direct communication back to the streamer.

Live broadcasts are primarily powered by RTMP for ingesting live content and HLS for broadcasting content, allowing scalability and handling of video files via HTTP requests.

WebRTC is a technology that enables direct browser-to-browser communications with low latency. Mux uses WebRTC for live chats and can implement server-side WebRTC solutions for more complex broadcasting scenarios.

No, it's currently not feasible to convert WebRTC into RTMP directly in the browser due to limitations in accessing the necessary network stack through browser technologies.

GetUserMedia is part of the WebRTC API that allows access to the device's camera and microphone. Mux can broadcast content captured via GetUserMedia to a server using WebSockets, which is then encoded into RTMP for live broadcasting.

Using Chrome in headless mode for broadcasting requires running one Chrome instance per stream, which can be resource-intensive and complex to orchestrate. It's not the most common approach due to these challenges.

Yes, Mux enables users to go live directly from the browser without the need to download third-party software like OBS.

Matt McClure
Matt McClure
8 min
18 Jun, 2021

Comments

Sign in or register to post your comment.

Video Summary and Transcription

This video explains how to go live from one browser to another using WebRTC and RTMP. Live chat involves low latency communication between browsers using WebRTC, while live broadcast uses RTMP and HLS for one-to-many streaming. The video discusses the limitations of converting WebRTC to RTMP directly in the browser and suggests using GetUserMedia to capture media and broadcast via WebSockets to a server, which then encodes it into RTMP. Another approach involves using Chrome in headless mode and the MediaRecorder API, although this method is resource-intensive. A more efficient method is using a full Docker container to capture and stream the screen with FFmpeg, avoiding the MediaRecorder API. This method offers more reliable streaming and flexibility in manipulating the stream. For more details, you can refer to Nick's talk from All Things RTC.

1. Introduction to Live Broadcast and WebRTC

Short description:

Hey everybody, my name is Matthew McClure. I'm one of the cofounders of Mux, and we do online video infrastructure for developers. Today we're talking about going live from the browser via another browser. Live chat and live broadcast are different in terms of communication and technology. Live chat uses WebRTC for low latency synchronous communication between browsers, while live broadcast uses RTMP and HLS for one-to-many streaming. We can't turn WebRTC into RTMP in the browser, but we can use a server-side WebRTC implementation. However, this approach may not be the easiest or most flexible for video processing on the server side.

Hey everybody, my name is Matthew McClure. I'm one of the cofounders of Mux, and we do online video infrastructure for developers. So one of our features is an API to live broadcast, and that's where we get a ton of questions from developers on how to help their customers go live. They're in a world where they want to just build an application in the browser, let the user just log in and immediately go live without needing to download third-party software like OBS or something like that to be able to do it. Totally makes sense.

But today we're not talking about just going live from the browser, we're talking about going live from the browser via another browser. This is also probably a bad idea for most use cases, but when you need this kind of thing, this can be a really great path forward. So we covered something similar, or another path to do this at React Summit. So we're going to quickly recap some of these high-level concepts, just to get on the same page. But if you want more information, you might want to check out that talk as well. You can just find it on YouTube.

So common misconception is that live broadcast is the same as live chat. So live chat, you have two browsers that can communicate, or a few browsers, that can communicate directly to each other, sub 500 milliseconds of latency so they can talk synchronously. Live broadcast, on the other hand, is one-to-many. So you have one input stream out to many viewers, and that can be 50 to a million viewers. Latency can be 10 seconds plus, it's fine, because there's not really an expectation to be able to communicate back to that streamer. So because of those constraints, the same technology really doesn't work very well for both of them. For a live chat, it's typically powered by browser technologies like WebRTC or proprietary implementations that can allow you to communicate directly between the streamers so that you have as low a latency as possible. Live broadcast, on the other hand, is powered by technologies like RTMP and HLS. RTMP is kind of an old flash implementation that has become the de facto standard for being able to ingest live content directly into a server, which then will transcode that content and broadcast out via HLS. We won't get the specifics of HLS, but for our purposes, it allows you to download video via git requests on the browser, and you can just scale it as you would any other file transfer, which is really nice.

Okay, so let's just take WebRTC and then turn that into RTMP in the browser, is probably what you're thinking. Unfortunately, no, we can't get quite low enough in the network stack in a browser to be able to do it, so even in our current modern world of WASM and all this other goodies, we just can't quite get there. But let's talk about what technologies we can access. So whatever we're talking about here, it's all involving a server in some way, but the first way is we can take WebRTC and then use a server-side WebRTC implementation. So if you'd asked me a year ago, I'd have said, This is crazy, this has gotten a lot better. Projects like Pyon have come a really long way. It's a per year ago implementation. So this actually isn't that crazy anymore, but it's still not, it's certainly not the easiest way that you can get this done. And if you want to be able to do anything interesting with the video on the server side, via client-side technologies, this would kind of leave you in the cold a little bit.

2. Running WebRTC on a Server via Chrome

Short description:

To fix the issue of running WebRTC on a server via Chrome, an alternative approach is to use GetUserMedia to capture the microphone and camera, broadcast it to a server via WebSockets, and encode it into RTMP. This involves running one Chrome per input or output stream, which can be resource-intensive. However, open source projects like Jitsi have implemented this method for one-to-many or few-to-many broadcasts. Another approach is to use the Chrome.tabCapture API, which has similar internals to the MediaRecorder API. This allows for running Chrome in headless mode, providing easier multi-tenant access and browser features, but still relying on the MediaRecorder API.

So, to fix that last thing, what if we just took WebRTC, ran it on a server via Chrome? It can be done, but the problem is now you're running Chrome. Or we can take GetUserMedia, which is just a few of the WebRTC APIs that allow you to get like the microphone and camera. We'll broadcast that to a server via WebSockets, and then encode it into RTMP.

So, you might be thinking, how does that work? Let's go back to this headless Chrome thing. If you want more information on that one, you can talk about the other talk I mentioned. Or you can go watch the other talk I mentioned. So WebRTC to a server-side WebRTC via headless Chrome. Kind of cool. You can just have a chat, one-to-one, few-to-few. Have headless Chrome join that chat, broadcast that via RTMP. Really interesting. You want to hide that Chrome in the client of the other chatters, but that Chrome can then lay out the chat interface how it wants, add overlays, anything like that, right there.

So what about these downsides? You have to run one Chrome per input stream or per output stream. And so you have all the orchestration that comes with that. So if you use Chrome as your normal browser, you might notice it's resource-intensive. That also applies on the server. The bigger issue, though, is it's not the most beaten path. A lot of people are doing this. They're just not talking about it. The exception is open source projects like Jitsi, which if you're not familiar is like an open source Zoom competitor. That's how they do one-to-many broadcast, or a few-to-many broadcast.

So there are a few paths to get this done, which come with a few tradeoffs. One is to do this getUserMedia style approach and then broadcast that to a server of web sockets. You might be thinking like, wait, why are we talking about this again? It's not actually getUserMedia. Now we're going to use the Chrome.tabCapture, but it uses a very similar API under the hood. It's the same internals as the MediaRecorder API, which is what we would use in that implementation. So here we take WebRTC, the same process where we have it in the browser, that joins, call the TabCapture API, broadcast that via WebSocket to a server that encodes it in RTP and goes to the rest of the workflow. Those can be on the same server, but that's kind of the high level. The pros are that you can actually run Chrome in headless mode, which means you get much more multi-tenant, much easier multi-tenant access, you get all these browser features, we can use the fancy WebSocket workflow. The downside is it still uses the MediaRecorder API, which is kind of a disaster.