Using MediaPipe to Create Cross Platform Machine Learning Applications with React

Rate this content
Bookmark

This talk gives an introduction about MediaPipe which is an open source Machine Learning Solutions that allows running machine learning models on low-powered devices and helps integrate the models with mobile applications. It gives these creative professionals a lot of dynamic tools and utilizes Machine learning in a really easy way to create powerful and intuitive applications without having much / no knowledge of machine learning beforehand. So we can see how MediaPipe can be integrated with React. Giving easy access to include machine learning use cases to build web applications with React.

This talk has been presented at React Advanced Conference 2021, check out the latest edition of this React Conference.

FAQ

Shivay Lamba is a Google Snowfold Mentor at MediaPipe and a speaker at React Advanced.

Shivay Lamba's talk at React Advanced focuses on using MediaPipe to create cross-platform machine learning applications with React.

MediaPipe is Google's open-source cross-platform framework that helps build different kinds of perception pipelines, enabling the use of multiple machine learning models in a single end-to-end pipeline.

MediaPipe is important for web applications because it allows for the integration of machine learning models to enhance functionalities like face detection, hand tracking, and object detection across various platforms.

Common use cases of MediaPipe include face detection, hand tracking, selfie segmentation, face mesh, hair segmentation, object detection and tracking, human pose detection, and holistic tracking.

MediaPipe enables cross-platform development by allowing developers to build a solution once and deploy it across multiple platforms such as Python, JavaScript, Android, and iOS.

Examples of MediaPipe in real-world applications include face mesh used in AR Lipstick try-on on YouTube, augmented reality movie filters, and Google Lens translation.

Face Mesh in MediaPipe is a solution that provides over 400 facial landmarks, which can be used to create applications like AR filters and virtual makeup.

You can integrate MediaPipe with React by using various NPM modules provided by the MediaPipe team, such as face mesh, face detection, hand tracking, and selfie segmentation, and incorporating them into your React application code.

More information about MediaPipe can be found on the MediaPipe documentation site at docs.mediapipe.dev and on the GitHub repository under Google's organization.

Shivay Lamba
Shivay Lamba
21 min
25 Oct, 2021

Comments

Sign in or register to post your comment.

Video Summary and Transcription

MediaPipe is a cross-platform framework that helps build perception pipelines using machine learning models. It offers ready-to-use solutions for various applications, such as selfie segmentation, face mesh, object detection, hand tracking, and more. MediaPipe can be integrated with React using NPM modules provided by the MediaPipe team. The demonstration showcases the implementation of face mesh and selfie segmentation solutions. MediaPipe enables the creation of amazing applications without needing to understand the underlying computer vision or machine learning processes.

1. Introduction to MediaPipe and Machine Learning

Short description:

Hello, everyone. I'm Shivay Lamba. I'm currently a Google Snowfold Mentor at MediaPipe, and I'm going to be talking at React Advanced. So excited to be speaking at React Advanced on the topic of using MediaPipe to create cross-platform machine learning applications with React. Machine learning is literally everywhere today, and it's important to use it in web applications as well. MediaPipe is Google's open source cross-platform framework that helps build perception pipelines using machine learning models. It can process audio, video, image-based data, and sensor data, and includes features like end-to-end acceleration.

Hello, everyone. I'm Shivay Lamba. I'm currently a Google Snowfold Mentor at MediaPipe, and I'm going to be talking at React Advanced. So excited to be speaking at React Advanced on the topic of using MediaPipe to create cross-platform machine learning applications with React.

So a lot of this talk is going to be centering around machine learning, MediaPipe, and how you can integrate, basically, MediaPipe with React to create really amazing applications.

So without wasting any further time, let's get started.

The first thing, of course, I mean, today, machine learning is literally everywhere. You look at any kind of an application, you'll see machine learning being used there. Whether it's education, healthcare, fitness, or mining, for the sake of it. You'll find the application of machine learning today in each and every industry that is known to humankind.

So that makes machine learning so much more important to also be used in web applications as well. And today, as more and more web applications are getting into the market, we are seeing a lot more of the machine learning use cases within web applications as well.

And let's actually look at a few of these examples that we can see. For example, over here we can see a face detection happening inside of the Android. Then you can see the hands getting detected in this iPhone XR image. Then you can see the Nest Cam that everyone knows is a security camera. Then you can see some of these web effects where you can see this lady and she has some facial effects happening on her face using the web. Or you can also see the Raspberry Pi and other such kind of micro based microchips or such kind of devices that run on the edge.

And what are the things in common in all of these? That's the question. So the thing that is common in all of these is media pipe.

So what exactly is media pipe? Media pipe is essentially Google's open source cross-platform framework that actually helps you to build different kinds of perception pipelines. What that means is that we are able to basically build or use multiple machine learning models and use them in a single end-to-end pipeline to let's say build something. And we'll also look at some of the common use cases very soon.

And it has been previously used widely in a lot of the research-based products at Google. But now it has been made upstream. And now everyone can actually use it since it's an open source project. And it can be used to process any kind of an audio, video, image-based data and also sensor data. And it helps primarily with two things. One is the data set preparation for different kinds of pipelines within machine learning and also building basically end-to-end machine learning pipelines. And some of the features that are included within MediaPipe include end-to-end acceleration because everything is actually happening on-device.

2. MediaPipe Solutions and Real-World Examples

Short description:

MediaPipe is a cross-platform-based framework that offers ready-to-use solutions for various applications. Some of the solutions include selfie segmentation, face mesh with over 400 facial landmarks, hair segmentation, object detection and tracking, facial detection, hand tracking, human pose detection and tracking, holistic tracking, and 3D object detection. These end-to-end solutions are popular and have real-world applications in AR, movie filters, Google Lens, and augmented faces. A live perception example demonstrates hand tracking using landmarks to denote the edges of the hand.

Then secondly is that you just have to actually build it once and different kinds of solutions including Python, JavaScript, Android, iOS, all those can actually be used. So you just have to build it once and you can use it on different types of platforms. That is why we are calling it a cross-platform-based framework.

And then these are just ready-to-use solutions. You just have to import them and integrate them into your code and it will be very easily used. And the best part about it is that it is open-source. So all the different kinds of solutions, all different codebases you can find on the MediaPipe repository on Google's organization on GitHub.

Now, looking at some of the most commonly used solutions, some of the most well-known solutions include the selfie segmentation solution that basically, you know, is also actually being used in Google Meet where you can see the different kind of backgrounds that you can actually apply, the blurring effect. So what it does is that it uses segmentation mask to only detect the humans in the scene and it is able to extract only the information needed for the humans. And then we have Face Mesh that basically has more than 400 plus facial landmarks that you can put, and you can make a lot of different interesting applications using this. For example, let's say AR filters or makeup, right? Then we have hair segmentation that allows only you to segment out the hair. Then we have a standard computer vision based algorithms like object detection and tracking that you can do to detect specific objects. Then we have facial detection, we also have hand tracking that can track your hands and you can probably use it for things like, you know, being able to use hand-based gestures to control, let's say, your web application. Then we have the entire human pose detection and tracking that you could probably use to create some kind of a fitness application or a dance application that can actually track you. Then we have the holistic tracking that actually tracks your entire body, right? And it tracks your face, your hands, your entire pose, right? So it's a combination of basically the human pose, hand tracking and the face mesh. Then we have some more advanced object detection, like the 3D detection that can help you to detect, you know, bigger objects like a chair, shoes, table. And then we have a lot more other kinds of solutions that you can actually go ahead and look at. And these are all end-to-end solutions that you can directly just implement. That is why MediaByte solutions are so popular.

And just to look at some of the real-world examples where it's being actually used. We just spoke about the face mesh solution that you can see over here, you know, taking place on the AR Lipstick try-on that is there on YouTube. Then we have the AR-based movie filter that can be used directly in YouTube. Then we have some basically Google Lens surfaces that you can see like augmented reality taking place. Then you can also see it also being used not only like in these augmented reality or like these kind of things, but also in like more other kinds of inferences, like the Google Lens translation, that also does use the MediaByte pipelines in its packet. And you can see like augmented faces that again is based on the face mesh. So let's look at a very quick live perception example of how basically you know, it actually takes place. For this, what we're going to be doing is we're going to be looking at the hand tracking, right? So essentially what we want to do is that we take an image or a video of your hand and we're able to put these landmarks. What are landmarks? Basically, landmarks are these dots that you see and you can superimpose them on your hand and they sort of denote all the different uh, you know, like you could say the different edges of the, of your hand and you're going to be superimposing them. So this is what the example is going to be looking like. So how would that simple perception pipeline look like? So essentially, first you'll take your video input.