Video Summary and Transcription
The talk covers the use of HTML5 video APIs and Canvas for creating interactive video experiences. It demonstrates how to manipulate video frames with Canvas to add effects like grayscale and text overlays. The video also explores real-time object detection in live streams using TensorFlow and HLS technology for adaptive bitrate streaming. Practical applications include enhancing user interactivity on web platforms by rendering video frames to Canvas elements, which avoids double streaming and sync issues. The talk shows how these technologies are applied on Mux's marketing website, making it more engaging with interactive elements. The speaker emphasizes the potential of these tools for real-time video manipulation and interactive video experiences directly in the browser.
1. Introduction to Canvas and HTML5 Video APIs
Hello, everyone, at React Summit. Today, we're going to talk about the Canvas and HTML5 video APIs and some cool stuff you can do with them. I'm Dylan Javary from Mux, where we provide Video for Developers. We focus on creating easy-to-use APIs for video. If you're interested, let's chat.
This is a test. Hello, everyone, at React Summit. I'm very excited to be talking to you here today. We're going to be talking about the Canvas and HTML5 video APIs, and some cool stuff that we found that you can do with them.
So quick intro, I'm Dylan Javary, I work at Mux. If you have not heard of Mux, Mux is Video for Developers. Maybe you know of Stripe, Stripe is payments for developers, or you know Twilio, which is phone calls and text messages for developers. We like to be like those companies, where we're built first with developers in mind and try to make great easy-to-use APIs, but we do this for video.
I'm not going to be talking too much more about Mux today, but if you are interested, come talk to me. I'd love to chat with you.
2. Introduction to React App and Player Component
I'd love to chat with you. Let's start with a simple demo of a React app using the player component and canvas. The player component is a video element that uses HLS technology for video streaming.
I'd love to chat with you. Cool, so now to jump into some code. So I have this code sandbox set up. Code sandbox is a great tool, by the way. It's become one of my favorite pieces of software. I think there's some code sandbox folks here at this conference, so shout-out to you all. I love this product. And I'll be sharing this after so you can fork it, play with the code, do things yourself.
And let's just start out with a really simple demo. So this is a very straightforward React app. We have a few different routes. These five different examples I'm going to show and we're using React Router, React DOM. And let's start with the first one. Start with a simple demo. So right here we have simple.js. This is the component that we're rendering. We have this player component and then we have this canvas. And right now, you can't see the canvas on the page, but that's what we will be. We'll be kind of manipulating that and doing some fun stuff as we go along.
So real quickly, let's just take a look at this player component. So this player component is... It's really just a video element. But if you're familiar with video... How many of y'all have done video on the internet? So video streaming, video on demand or live streaming, anything like that. You might have used the video element before and maybe you've done an MP4 file and that can kind of work. But when you really want to do video streaming properly, what you need to do is use something like HLS. So HLS is a technology that allows you to basically download videos in segments and at different bit rates and different quality levels according to the user's bandwidth. So that's kind of something muts does for you. We're not going to get too deep into that. But that's what we're using here on this video player.
3. Exploring HTML5 Video Element and Canvas
So this is the HTML5 video element with extra JavaScript for HLS capabilities. When the play event fires, the onPlayCallback is called. The video is duplicated on a canvas element below. The code uses the video element and a canvas context to manipulate and draw images onto the canvas. The drawImage function copies each frame from the video element to the canvas. Let's take it one step further and look at the filter example.
So this is... It's really just the HTML5 video element. And then we're attaching some extra JavaScript to give it some HLS capabilities. And then when the play event fires, that play event is when the playback begins on the video, and we're gonna call this onPlayCallback.
So let's jump back into the component that's rendering this page. Zoom in a little bit here. Make sure you can see that. So right here, we have the player, onPlayCallback. And when that fires, see what happens. What we see is this video is playing in the video element. And then it's being duplicated on this canvas element right below.
Let's jump into some of this code. So onPlay calls, we grab the video element, and we create this context, this context ref. What this is it's sort of a handle onto the canvas element. And then we can call functions on that context that allows us to manipulate that canvas element, change how it's displayed, and that's kind of our hook into manipulating the actual canvas itself. So onPlay, we call requestAnimationFrame, call updateCanvas. And what that's going to do is just call this one liner, drawImage, we pass that video element into it. And this tells the canvas to just draw this image onto the canvas. And these are the dimensions. This is the coordinates where to start and these are the dimensions to draw. And that this is actually, we call this recursively. So every time this runs, we requestAnimationFrame again, and then the callback call updateCanvas again. So you can see what's happening. We're just drawing that, we're basically copying that video element down onto the canvas and right below it. So that's how that works. Quick showing what we did there. Video element, copy each frame, draw them onto the canvas, pretty simple, right?
So now let's jump into, take this one step further. So let's go to this filter example. So what the filter does, could play, okay, same kind of thing, but you can see something else is going on here. What we're doing is the same kind of callback update canvas.
4. Manipulating Canvas and Video Frames
We can manipulate and work with raw image data from the canvas. By iterating through the image data and adjusting color values, we can achieve effects like grayscale. Additionally, we can add text on top of the canvas, allowing for real-time modifications. This opens up possibilities for interactive video manipulation using browser APIs. Let's explore more examples, including grabbing individual frames from a video and manipulating them. The video we're using is big buck bunny, a popular example in the video streaming community.
And what we do is we draw that image onto the canvas. We extract the image data off the canvas. And now we have like raw image data that we can actually manipulate and work with. And we're gonna iterate through that image data and we're gonna mess with the color values. We can average out, if we average out the red, green, and blue values, that's gonna give us this grayscale effect. So we're actually just like manipulating the image frame by frame from the video at a time, and then putting it back onto the canvas, redrawing it onto the canvas. And you can see it has that effect. And you can see this, this canvas is always staying synced with the frame of video that the video element is rendering.
Okay. Pretty cool, right? So let's look at the steps that we did there where we took this kind of a little bit further. So, instead of just drawing each frame onto the canvas, after we do that, we're extracting the frame, manipulating the colors onto a gray scale, and then redrawing it back onto the canvas. Okay. So now we have a few more examples. Let's see what else we can do. It's going to get better and better each time. Layla, this is my coworker Phil's dog. And let's look at this example. So now in the update canvas function, we draw the image, and then we're just going to add this context Phil, we're going to call this Phil text method on the canvas. So what we're doing there is we're actually just adding text on top of the canvas. So we're rendering the video image and into the canvas, and then just adding text on top. Now you can imagine this could get pretty useful, right? If we have a video, and that video we're playing, if we just hit this video element and played it and draw it onto the canvas, then we can do all these cool things like add text in real time, do all these cool things in real time, frame by frame on the client side in the browser, all with these browser APIs. So that's where we're adding a name. Let's see what else we can do.
Okay. So now let's get into this one. This is called classify. So what we've looked at is we can grab individual frames from the video in real time, draw them onto a canvas, and before we draw them onto a canvas, we can manipulate them, right? So what else can we do? When we have a raw frame of a video, let's think about what else we can do. So this video, if you don't recognize this video, this video is big buck bunny. It's sort of the canonical, hello world video example in the kind of video streaming community. I've watched this video way too many times and it kind of makes a good example.
5. Real-time Object Detection and Use Cases
In this classify demo, we run machine learning object detection on each video frame, drawing rectangles around detected objects. We use the TensorFlow Cocoa SSD model to detect objects in real time. By extracting image data from the canvas, we can map predictions and draw boxes with labels on the video. Although not perfect for animated content, it can detect real-life objects accurately. This opens up possibilities for real-time object detection in live video streams. Let's explore more use cases.
So I'm gonna use this for the purposes of this, this classify demo. And let's just push play here. And if you see what's happening is every frame of the video, we're running some like machine learning, object detection functionality on each image frame, and you can see it's, and then we're drawing the rectangle after we detected the object onto the frame. And right now it thinks this is a person, go a little further, now it thinks it's a bird. So we're actually like detecting frame by frame, what's going on with the objects in this video.
So let's take a look at the code. We draw the image onto the context, we extract the image data. And this is the same image data where we were manipulating the colors, but we have this extra call here, which is model.detect and we pass in that image data. So model is something that comes from this TensorFlow Cocoa SSD model, which is this TensorFlow model that will do object detection on images. It's made to work with images. And when we pass in this image data that we've extracted from the canvas, it's going to run the object detection and send us back an array of predictions that they call it, okay? So now once we have an array of predictions, we can pass those into this outline stuff function that's going to map those predictions. It has the X, Y coordinates, the width and the height of this bounding box. And then we can actually just draw those boxes with the labels directly on to that canvas element that we're already using to render the video. So you can see it thinks it's a bird, still thinks it's a bird. And dog, we saw there was a dog there for a second, here it thinks that is a sports ball. So, you know, it's not the most accurate object detection for this animated content. Now it's a sheep, it kinda looks like sheep, but we're actually able to do some pretty cool stuff. And remember, this is happening in real time. So we don't even necessarily, a lot of times when you're doing image detection on a video, you would do that out of band, on a server, kind of once the video is kind of finalized. But imagine this was a live stream, right? If we're dealing with a live stream of video, we'd be able to actually run this on the client and actually detect objects in real time. And, you know, the sky's the limit there and we can do all kinds of things with the detection that we're doing. Let's look at one more example of the classification. Let's pull up, Laila fills dog again, and you can see here, TensorFlow for a real live video. It's the type of dog, it's actually pretty good at detecting real life things, animated things, animated giant bunnies, maybe not so much, but a dog, it can get. So that is to really quickly review what we did there. So the kind of key kind of part to pay attention is that once we get images into a canvas, we can actually extract that raw image data. And then this red circle where we're doing live object detection, replace that with anything, right? Manipulate the colors, add text overlays. And then we can redraw those back onto the canvas and with all the canvas APIs that are available. So that's what we did there. Now, let's take a quick look at some real world use cases of this.
6. Enhancing Marketing Website with Interactive Video
We recently did a design refresh on our marketing website, adding an API demo in the top hero section. Previously, we had a single video, but this time we wanted to make it more interactive. By using a strategy of copying frames from a video element and rendering them to canvas elements, we achieved the desired effects. This approach eliminated the need for double streaming, avoided playback sync issues, and allowed for a more interactive experience. If you're interested in video, let's chat!
We at Mux, we actually use this on our marketing website recently. So we recently kind of did a design refresh on our marketing website and we have this API demo in this top hero section and you can see what's going on here. These before previously on our marketing site before this iteration, we had a similar sort of API demo but it was all one video. So you can imagine if all of this here was just one video with this device and the browser popping out, that worked pretty well but we kind of wanted to make it better this time.
What we were thinking is that you'll notice that as I'm hovering over this, that's popping out. If I hover over the browser and browser pops out I can copy text here. I can interact with it. That's what we want to do. Like, let's say a developer comes here they want to copy this text or, you know, but just make it more interactive. We also have these bleeding colors in the back that we want those to bleed kind of outside the bounds of this element and kind of bleed into the top header and bleed into the bottom. And if this was just a static video we wouldn't be able to get that effect.
So, the way we were able to pull this off I have a storybook here example. So, the way we were actually able to do this is through the strategy that I described. So we actually inspect these elements. Okay, we inspect these elements. You can see that this right here is a canvas. Let me replay this. And then we see that this right here is another canvas. And then if we look further down here in the DOM, we can actually see that there's a video element. So this is the video element that is streaming the video. And then we're copying the frames of that video and rendering it to these two canvas elements in real time. So the benefits of that strategy, alternatively, we could kind of pull out the same design and have this browser be one video element and this device be another video element. And that would work okay. Except the downside of that is, number one, we're like double streaming the same video, which is gonna double the bandwidth, more bandwidth for the user. More video data being downloaded seems unnecessary and repetitive. Number two is that then the two videos could get out of sync, right? Like, if one video buffers and you're on a slow connection and the other one's not buffered yet, then you can get this playback sync happening so we'd probably have to write some JavaScript that kind of like keeps the play heads aligned and in sync and that seems kind of buggy, not a great solution. So what we did is kind of apply the strategy of taking this video element, grabbing the frames from that video element, rendering them to the canvas. And that way, these two canvases will always stay in sync. We're only downloading the video once. It works well and let's play this one more time. And that's the solution we came to. So you'll notice now I can hover over this, hover over this and the devices pop out and it's more interactive. I can copy code and now this video, this is a happy birthday video for React Summit. It's a video I found online of kids crying when they blow out their birthday candles and it's kind of funny. So happy birthday React Summit. I'm excited to be here, excited to talk with you all. And if you have anything to talk about video, I'd love to chat. If you're adding video to your product, building video, doing cool things, please chat with me and thanks for having me. Find me on Twitter, DylanJAJ and that is the end.
Comments