Video Summary and Transcription
Today's Talk discusses the use of video sprites to optimize video playback. Video sprites, similar to CSS sprites, allow for the selection of specific parts of an image. By combining multiple videos into one and selecting the desired one, video synchronization can be enhanced, and viewers can choose different camera angles in live events. The implementation involves dividing the video into quadrants, allowing viewers to select which quadrant they want to watch for synchronized feeds and a shared audio stream.
1. Introduction to Video Sprites
Today I want to talk about using video sprites, similar to CSS sprites, to optimize video playback. A sprite is an image with multiple images in it, allowing the client to choose which parts to display. This technique was widely used in the early aughts for optimizing web buttons. Video game sprites, like Mario's states, are a common example. By combining multiple videos into one and selecting the desired one, we can enhance video synchronization and enable viewers to choose different camera angles in live events such as concerts and sports.
Hi everyone, today I want to talk about something a little hacky I've been thinking about lately, but the idea here is that we want to use video sprites in the same way that a lot of you might have used CSS sprites back in the day, so we'll talk about what I mean by a lot of this but hello I'm Matt McClure, I'm one of the co-founders of a company called Mux, and I run the Developer Experience org there, and in a nutshell we build awesome video infrastructure for developers, so if you're looking for a video API, we're there so check us out.
So okay taking a step back, what is a sprite? So in a nutshell this is an image with a bunch of images in it. So the client gets this big combined image and then can just pick and choose which parts of it to show, so if you're relatively new then you might not have seen this as widely used but this is a really common optimization technique from the early aughts, so if you had a button with hover states and depressed states and active states then you would just send one button image and then use your CSS background to decide which one of that image to show. So a little more common if you're kind of started there, you might remember this from back then.
But to show this a little more concretely, a lot of people know about this from video game sprites, so all of Mario's states are in one image and the client viewport then just shows the state of Mario at once. So, you might be wondering what in the world does this have to do with video. The gist here is that the same technique works. You can send a few videos combined into one and then just show the one you care about in the player. So you might be wondering why, why in the world would we be doing this? And I would say, you know, use your imagination. There could be a bunch of examples of this, like, I think sports, sporting events, concerts.
2. Feed Synchronization and Viewer Selection
To synchronize feeds and allow viewers to choose between different camera angles in live events, a solution is to combine all the feeds at a local encoder box and send them as one video. The video is divided into quadrants, and viewers can select which quadrant they want to watch, ensuring synchronized feeds and a shared audio stream.
But the biggest example that comes to mind and what we see the most from customers wanting to do stuff like this is around feed synchronization. Particularly around being able to pick between these different feeds in a synchronized way. So, let's say you're streaming live concerts, live music. So, you've got a bunch of different cameras streaming a bunch of different angles. So, one's on the drummer, one's on the singer, one's on the audience, and a producer on site is typically deciding which one of those feeds to show at any given time. So, they might do a nice little transition, go from the drummer to the audience, etc. That producer then sends a feed to an encoder service or whatever else that looks like. I'm using Mux's example here for obvious reasons. But then that service then broadcasts that to all of your end viewers. So then those viewers start saying like, actually I just want to watch the drummer all the time and I hate the transitions that this producer's doing. So, they want the power to be able to pick which feed they watch. So you decide, okay, how can we go about building this out? So you start thinking, okay, I'll send every camera directly to that encoder or broadcast service. And then every viewer can get all the feeds, in this example, three feeds. And this is where things really get hairy if you start going down this path. So now you've got three different live streams that people can watch, but how do you switch between them? Do people just click another feed? And then you might be a few seconds off in terms of audio for all of them. So it can be tough to synchronize that in the client, or honestly, next to impossible to do that well. So one solution would just be to send one video again. So like you were doing before, but instead of that video being produced, you just combine all the feeds at that level and then send them along. So in this example, all the cameras go into that one encoder box locally. It just lays them out in four quadrants, sends those to the encoder broadcaster service, and that goes out to all the viewers. And then from there, the viewers can then pick which one they want. So now you're guaranteed that your feeds are synced, you only have to worry about one audio stream that's shared between all of them for obvious reasons, and then you only show the quadrant of the video that the viewer selects at any given time.
3. Implementing Video Sprites
To implement this, you need to grab the coordinates of the desired video feed and lay them out. Use the source X, source Y position, source width, and source height to define the areas to chop. Each quadrant should be half the video height and width. Update the canvas by drawing the video image and passing the coordinates. This will ensure that only the desired quadrant is displayed. Continuously call request animation frame for smooth playback.
So how might this work detail-wise? So all this code's on GitHub. I would suggest checking it out there. But at a high level, you want to grab the coordinates of the feed that you want to show. So I have the feeds just named like 0 through 3. And you want to lay those coordinates out. So it's in this example, this array is the source X, source Y position, source width and source height. So what you want to chop. So 0, 0, top left, 0, half the video width is the top right quadrant, and so on, and so forth. And since these are quadrants, you just each one is half the video height and half the video width. And then you use these when you're updating the canvas. So when you draw that image to the canvas, you draw the video image, or you pass the video image in, and then you pass these coordinates that we just grabbed, which will then say, okay, only draw the top right quadrant of the video into canvas. And then call request animation frame over and over and over again.
Comments