Web Apps of the Future With Web AI

Rate this content
Bookmark
Web AI is the practice of running machine learning models directly in the browser using JavaScript, WebAssembly, and WebGPU. This approach offers significant benefits such as enhanced privacy, low latency, and the ability to operate offline. The use of TensorFlow.js allows developers to deploy models like object recognition, text toxicity detection, and face mesh for real-time applications. Practical examples include background blurring in video conferencing and remote physiotherapy using pose estimation models. Web AI also improves accessibility by automatically filling captions for images. Popular models include YOLO for object detection, and the MediaPipe LLM inference API for language tasks. Books like 'Deep Learning in JavaScript' and 'Learning TensorFlow.js' are recommended for beginners.

From Author:

AI is everywhere, but why should you care, as a web developer? Join Jason Mayes, Web AI Lead at Google, who will get you on track by demystifying common terminology ensuring no one is left behind, and then take you through some of the latest machine learning models, tools, and frameworks you can use right in the browser via JavaScript to help you bring your creative web app ideas to life for almost any industry you may be working in. By moving AI to the client side, there is no reliance on the server after the page load, bringing you benefits such as privacy, low latency, offline solutions, and lower costs which will be of growing importance as the field develops. This talk is suitable for everyone with a curiosity for web and machine learning, so come along and learn something new to put in your web engineering toolkit for 2024.

This talk has been presented at JSNation 2024, check out the latest edition of this JavaScript Conference.

FAQ

Jason Mayes is the Web AI Lead at Google.

Benefits of using Web AI include enhanced privacy, the ability to run offline, low latency, lower costs, and a frictionless user experience.

Yes, Web AI can operate offline on the device itself, making it possible to perform tasks even in areas with low or no connectivity after the page has loaded.

Practical examples include remote physiotherapy using browser-based pose estimation models, product placement verification in supermarkets, background blurring in video conferencing, and real-time facial feature recognition for augmented reality.

TensorFlow.js is a JavaScript library for training and deploying machine learning models in the browser and on Node.js.

Popular Web AI models include object recognition, text toxicity detection, selfie depth estimation, face mesh, hand tracking, and large language models.

Web AI is the art of using machine learning models client-side in a web browser, running on your own device's processor or graphics card using JavaScript and surrounding web technologies like WebAssembly and WebGPU for acceleration.

Web AI runs machine learning models on the client side in the web browser, using the device's processor or graphics card, whereas Cloud AI executes models on the server side and requires an active internet connection to access the server's API.

No, Web AI can work in any browser that supports WebAssembly or WebGPU, allowing it to run on a wide range of devices including mobile phones.

Web AI can improve accessibility by using models to automatically fill captions for images that lack alt text, among other applications.

Jason Mayes
Jason Mayes
32 min
13 Jun, 2024

Comments

Sign in or register to post your comment.
  • Francisco Baptista
    Francisco Baptista
    Great keynote Jason!! TeamSportz is expanding our use of pose estimation to deliver exercises to help athletes recover from injuries. We should talk

Video Transcription

1. Introduction to Web AI in JavaScript

Short description:

I'm Jason Mayes, web AI lead at Google. Start investigating machine learning on the client side in JavaScript to gain superpowers in your next web app. Web AI is the art of using ML models client-side in a web browser, different from cloud AI. AI will be leveraged by all industries in the future. Upskill in this area now for unique benefits in JavaScript.

I'm Jason Mayes, web AI lead here at Google. Today I've come to you as a fellow JavaScript engineer to share with you a story about why you should start investigating machine learning on the client side in JavaScript to gain superpowers in your next web application.

First, let's formally define what I mean by web AI which is a term I coined back in 2022 to stand out versus cloud-based AI systems which were popular back then. Web AI is the art of using machine learning models client-side in a web browser, running on your own device's processor or graphics card, using JavaScript and surrounding web technologies like WebAssembly and WebGPU for acceleration. This is different from cloud AI whereby the model would be executing on the server side and be accessed via some sort of API instead, which means you need an active internet connection to talk to that API at all times to provide the advanced capabilities provided.

As web developers and designers, we have the privilege of working across industries when we work with our customers. In a similar manner, artificial intelligence is likely to be leveraged by all of those industries in the future to make them more efficient than ever before. In fact, in a few years' time, customers will expect AI features in their next product to keep up with everyone else who is already doing it. So now is the perfect time to upskill in this area as you can get unique benefits when doing this on-device in JavaScript.

2. Advantages of Client-side AI in Web Applications

Short description:

Privacy: No data needs to be sent to the server for classification, protecting user's personal data. Ability to run offline on the device itself. Low latency enables real-time model execution. Lower cost by running AI directly in the browser. Frictionless experience for end users. Reach and scale of the web. Growing usage of client-side AI libraries. Real-world example of video conferencing solution with background blur. Cost savings of using client-side AI in video segmentation.

What are those? Well, first up is privacy. As no data from things like the camera, the microphone, or even text for that matter needs to be sent to the server for classification which protects the user's personal data. A great example of this is shown here by include health who use browser-based pose estimation models to perform remote physiotherapy without sending any imagery to the cloud. Instead, only the resulting range of motion and statistics from the session are sent allowing the patient to perform the check-up from the comfort of their own house.

You also have the ability to run offline on the device itself, so you can even perform tasks in areas of low or no connectivity at all after the page load. Now, you might be wondering why would a web app need to do all that stuff offline? Well, in this great example by Hugo Zanini, he performs a product placement verification task using a web app in supermarkets for a retail customer he was working with. We all know how bad the Wi-Fi connections are in supermarkets. He leveraged TensorFlow.js right in the browser that can work entirely offline and then syncs the data back when he's got connectivity later on.

Next is low latency which can enable you to run many models in real time as you don't have to wait for the data to be sent to the cloud and then get an answer back again. As such, our body, pose, and segmentation models, for example, can run over 120 frames per second on a mid-range GPU's laptop with great accuracy as you can see on this slide. You've also got lower cost as you don't need to hire and keep running expensive cloud-based GPUs 24-7, which means you can now run generative AI directly in the browser like this large language model on the left-hand side without breaking the bank. And we're seeing production-ready web apps benefit from significant cost savings too like the example shown for advanced video conferencing features like background blurring shown on the right.

And even better, you can offer a frictionless experience for your end users as no install is required to run a web page. Just go to a link and it works. In fact, Adobe did exactly that here with Adobe Photoshop web, enabling anyone anywhere to use their favourite creative features on almost any device. When it comes to the object selection tool shown on this slide, embracing client-side machine learning can provide Adobe's users with a better user experience by eliminating that cloud server latency resulting in faster predictions and a more responsive user experience. And on that note, it also means you can leverage the reach and scale of the web itself that has over six billion browser-enabled devices for people capable of viewing your creation. So no matter if you're levelling up your next YouTuber livestream to become a different persona or capturing detailed facial movements to drive a game character using nothing more than a regular webcam or client-side in the browser, to the latest in generative AI where you can even run diffusion models in the web browser at incredible speeds with new browser technologies like web GPU now enabled by default in Chrome and Chrome-based browsers, things are about to get really exciting with regards to what we can expect from a web app in the future.

So even if you're not yet using client-side AI, I want to illustrate how fast this is growing and why you should take a look. I've only got statistics for Google's web AI libraries, so worldwide usage is probably higher than this, but in the past two years alone, we've averaged 600 million downloads per year of TensorFlow.js and media-based web models and libraries, bringing us to over 1.2 billion downloads in that time for the first time ever, and we're on track to be even higher in 2024 with our usage continuing to grow. So now it's really time to be part of this growth yourselves. In fact, we've seen this steady growth since 2020 as more and more developers just like you have started to use web AI in production use cases. And speaking of real-world examples, let's take a deeper dive into a typical video conferencing solution.

There goes my notifications. Many of these services provide background blur or background replacements in video calls for privacy. So let's crunch some hypothetical numbers for the value of using client-side AI in a use case like this. First, a webcam typically produces video at 30 frames per second. So assuming the average meeting is about 30 minutes in length, that's 54,000 frames you have to process every single meeting. Now, assuming, if you have a popular service, you might have a million meetings per day, that means 54 billion segmentations every single day. Now, even if we assume a really ultra-low cost of just 0.0001 cents per segmentation, that would still be $5.4 million a day that you would have to spend on the cloud, which is around $2 billion a year just for those GPU costs.

QnA