What is PoseNet? Let's talk a little bit about what PoseNet is. PoseNet allows you to do pose detection in the browser. It's a product from Google and it's very, very interesting. It uses TensorFlow.js behind the scenes so it can be used for your websites within your websites. It can be used with your webcam, with video within a canvas, or it can be used on Still Pics, so any kind of images that you want to show it.
It can display one or more people, so you can have a skeleton that you create for one person's poses to be displayed or you can focus on many people in a video or an image. One of the use cases is that it's being used for sports. For example, to speak with an American idiom, if I slid into first base when playing baseball and I twisted my ankle, if there's a video of that, you can see exactly and you could pinpoint exactly where it went wrong so that you can figure out where you went wrong and twisted your ankle, and maybe you could avoid doing that another time. Personally, I'd like to try this to show yoga poses and what the best positioning and placement is for your hands, feet, and your whole body. And all it does is it gives estimations of the position of 17 key body points and you yourself have to create the skeleton on canvas, so it's not completely out of the box. All it does is it gives you the estimation of those body points, and then you can go and do whatever you like with the data that's sent back to you.
An example output of PostNet looks like this. It's basically saying that, you know, we think this is the left eye. It sends back a score that it's 99% sure that this is a left eye at the X and Y Cartesian coordinates within the image or within the video. Now, if you're running this against a video, it's going to give you this data set for every key frame. So that is a pretty powerful piece of software.
Whenever I'm talking about machine learning in the browser, or any kind of machine learning built into web and mobile apps, I like to talk also about what's going on behind the scenes. And you can take a look at some of the papers, some of these research papers that are referenced in these slides. PoseNet is actually built on top of PersonLab models. And PersonLab was a new way of determining people's stances and actions in a video or photo. So previously, we would be looking for people in a video or a photo by giving bounding boxes. I'm pretty sure that this is a person. I'm pretty sure that this is another person. But PoseNet and PersonLab allows you to do a bottom-up, excuse me, parts-first way of determining what a person is and what exactly they're doing within an image. So instead of the top-down boxes, it's a bottoms-up, parts-first, and it's a box-free, fully convolutional determination. And it predicts relative position of those 17 key points. So there's eyes, ears, nose, mouth, I think, no, sorry, there's no mouth, shoulders, elbows, wrists, hip, knee, ankle. And it starts from the most confident detection, and it works up based on trained body poses. So let's dig even a little bit deeper than that. So what's going on behind the scenes here? Well, let's talk about PersonLab and how this thing is working in the background.
Comments