And in this case, the palm tree goes from the top left to the bottom right. And so you can start predicting where the next image is going to be and then do the delta based on this prediction. And so this step is called a video codec. And the most popular video codecs are H.264 and AVC, which are the same thing but with a different name like JavaScript and Xamarin. And there's also AV1, VP8, VP9.
So this video codec is able again to reduce the size drastically. So in this case, this is how our setup looks like. So we now have two types of frames. We have keyframes, so in this case, the first frame, which is using something like JPEG to compress it, and then we have delta frames. So in this case, like every one in this picture.
And now, in order to decode the video, it's no longer, oh, just give me like one image and I can do it. Now you need to start with the keyframe. And then, in order to decode the second one, you need to have decoded the keyframe, do the prediction where it's going to be next, and then do the delta in order to decode it. So now we are seeing that we need a stateful API and in a specific order. But this is only one part of the picture, because the people doing video encoding and compression wanted to do even better. One thing that they realized is that you can do this optimization going forward. But you can also do it backwards. So you can start from the end, do the prediction, the encoding, and then start looking at in which direction do we get the most savings. And take the one that is actually going to be the smallest overall. And so this is where the notion of bidirectional frames or B-frames comes in. So in this case, the frame number 5 is a B-frame, which means in order to decode it, you need to decode the number 4 and the number 6. And also to decode the number 6, you need to do like the 7, and 7 you need 8 and 8 you need 9, and same in the other way. And so now what you're seeing is in order to decode the video sequence in order, you need to send all of the frames in a different order. And this is where we have two notions of time. So we have the presentation time, which is the one that you expect to see in the duration of the movie, and then the second one is the decoding timestamp. And so this is the timestamp at which you need to send the frames in order to be decoded in the right order. And so this is where we've got our first, there's actually no truth, where time only goes forward. So now that we've seen the first breaking stuff, let's go back to the API, actually the real API. So in this case, we need to have some kind of load video API to give us all the frames. And then we want a decoder API.
Comments