AI is everywhere, but why should you care, as a web developer? Join Jason Mayes, Web AI Lead at Google, who will get you on track by demystifying common terminology ensuring no one is left behind, and then take you through some of the latest machine learning models, tools, and frameworks you can use right in the browser via JavaScript to help you bring your creative web app ideas to life for almost any industry you may be working in. By moving AI to the client side, there is no reliance on the server after the page load, bringing you benefits such as privacy, low latency, offline solutions, and lower costs which will be of growing importance as the field develops. This talk is suitable for everyone with a curiosity for web and machine learning, so come along and learn something new to put in your web engineering toolkit for 2024.
Web Apps of the Future With Web AI
FAQ
Jason Mayes is the Web AI Lead at Google.
Popular Web AI models include object recognition, text toxicity detection, selfie depth estimation, face mesh, hand tracking, and large language models.
Practical examples include remote physiotherapy using browser-based pose estimation models, product placement verification in supermarkets, background blurring in video conferencing, and real-time facial feature recognition for augmented reality.
TensorFlow.js is a JavaScript library for training and deploying machine learning models in the browser and on Node.js.
Benefits of using Web AI include enhanced privacy, the ability to run offline, low latency, lower costs, and a frictionless user experience.
Yes, Web AI can operate offline on the device itself, making it possible to perform tasks even in areas with low or no connectivity after the page has loaded.
Web AI is the art of using machine learning models client-side in a web browser, running on your own device's processor or graphics card using JavaScript and surrounding web technologies like WebAssembly and WebGPU for acceleration.
Web AI runs machine learning models on the client side in the web browser, using the device's processor or graphics card, whereas Cloud AI executes models on the server side and requires an active internet connection to access the server's API.
No, Web AI can work in any browser that supports WebAssembly or WebGPU, allowing it to run on a wide range of devices including mobile phones.
Web AI can improve accessibility by using models to automatically fill captions for images that lack alt text, among other applications.
1. Introduction to Web AI in JavaScript#
I'm Jason Mayes, web AI lead at Google. Start investigating machine learning on the client side in JavaScript to gain superpowers in your next web app. Web AI is the art of using ML models client-side in a web browser, different from cloud AI. AI will be leveraged by all industries in the future. Upskill in this area now for unique benefits in JavaScript.
I'm Jason Mayes, web AI lead here at Google. Today I've come to you as a fellow JavaScript engineer to share with you a story about why you should start investigating machine learning on the client side in JavaScript to gain superpowers in your next web application.
First, let's formally define what I mean by web AI which is a term I coined back in 2022 to stand out versus cloud-based AI systems which were popular back then. Web AI is the art of using machine learning models client-side in a web browser, running on your own device's processor or graphics card, using JavaScript and surrounding web technologies like WebAssembly and WebGPU for acceleration. This is different from cloud AI whereby the model would be executing on the server side and be accessed via some sort of API instead, which means you need an active internet connection to talk to that API at all times to provide the advanced capabilities provided.
As web developers and designers, we have the privilege of working across industries when we work with our customers. In a similar manner, artificial intelligence is likely to be leveraged by all of those industries in the future to make them more efficient than ever before. In fact, in a few years' time, customers will expect AI features in their next product to keep up with everyone else who is already doing it. So now is the perfect time to upskill in this area as you can get unique benefits when doing this on-device in JavaScript.
2. Advantages of Client-side AI in Web Applications#
Privacy: No data needs to be sent to the server for classification, protecting user's personal data. Ability to run offline on the device itself. Low latency enables real-time model execution. Lower cost by running AI directly in the browser. Frictionless experience for end users. Reach and scale of the web. Growing usage of client-side AI libraries. Real-world example of video conferencing solution with background blur. Cost savings of using client-side AI in video segmentation.
What are those? Well, first up is privacy. As no data from things like the camera, the microphone, or even text for that matter needs to be sent to the server for classification which protects the user's personal data. A great example of this is shown here by include health who use browser-based pose estimation models to perform remote physiotherapy without sending any imagery to the cloud. Instead, only the resulting range of motion and statistics from the session are sent allowing the patient to perform the check-up from the comfort of their own house.
You also have the ability to run offline on the device itself, so you can even perform tasks in areas of low or no connectivity at all after the page load. Now, you might be wondering why would a web app need to do all that stuff offline? Well, in this great example by Hugo Zanini, he performs a product placement verification task using a web app in supermarkets for a retail customer he was working with. We all know how bad the Wi-Fi connections are in supermarkets. He leveraged TensorFlow.js right in the browser that can work entirely offline and then syncs the data back when he's got connectivity later on.
Next is low latency which can enable you to run many models in real time as you don't have to wait for the data to be sent to the cloud and then get an answer back again. As such, our body, pose, and segmentation models, for example, can run over 120 frames per second on a mid-range GPU's laptop with great accuracy as you can see on this slide. You've also got lower cost as you don't need to hire and keep running expensive cloud-based GPUs 24-7, which means you can now run generative AI directly in the browser like this large language model on the left-hand side without breaking the bank. And we're seeing production-ready web apps benefit from significant cost savings too like the example shown for advanced video conferencing features like background blurring shown on the right.
And even better, you can offer a frictionless experience for your end users as no install is required to run a web page. Just go to a link and it works. In fact, Adobe did exactly that here with Adobe Photoshop web, enabling anyone anywhere to use their favourite creative features on almost any device. When it comes to the object selection tool shown on this slide, embracing client-side machine learning can provide Adobe's users with a better user experience by eliminating that cloud server latency resulting in faster predictions and a more responsive user experience. And on that note, it also means you can leverage the reach and scale of the web itself that has over six billion browser-enabled devices for people capable of viewing your creation. So no matter if you're levelling up your next YouTuber livestream to become a different persona or capturing detailed facial movements to drive a game character using nothing more than a regular webcam or client-side in the browser, to the latest in generative AI where you can even run diffusion models in the web browser at incredible speeds with new browser technologies like web GPU now enabled by default in Chrome and Chrome-based browsers, things are about to get really exciting with regards to what we can expect from a web app in the future.
So even if you're not yet using client-side AI, I want to illustrate how fast this is growing and why you should take a look. I've only got statistics for Google's web AI libraries, so worldwide usage is probably higher than this, but in the past two years alone, we've averaged 600 million downloads per year of TensorFlow.js and media-based web models and libraries, bringing us to over 1.2 billion downloads in that time for the first time ever, and we're on track to be even higher in 2024 with our usage continuing to grow. So now it's really time to be part of this growth yourselves. In fact, we've seen this steady growth since 2020 as more and more developers just like you have started to use web AI in production use cases. And speaking of real-world examples, let's take a deeper dive into a typical video conferencing solution.
There goes my notifications. Many of these services provide background blur or background replacements in video calls for privacy. So let's crunch some hypothetical numbers for the value of using client-side AI in a use case like this. First, a webcam typically produces video at 30 frames per second. So assuming the average meeting is about 30 minutes in length, that's 54,000 frames you have to process every single meeting. Now, assuming, if you have a popular service, you might have a million meetings per day, that means 54 billion segmentations every single day. Now, even if we assume a really ultra-low cost of just 0.0001 cents per segmentation, that would still be $5.4 million a day that you would have to spend on the cloud, which is around $2 billion a year just for those GPU costs.
3. AI Models for Web Applications#
Performing background blurring and pulling other models on the client side via web AI reduces costs. Thousands of AI models can be run in the browser today. Popular ones include object recognition for creating pet cameras, text toxicity detection for pre-filtering comments, and 3D data extraction for lighting effects and augmented reality. Also, face mesh models for real-time lip coloring and hand tracking models for gesture recognition and touchless interfaces.
By performing the background blurring on the client side via web AI, that cost can go away. And don't forget, you can even pull other models too, like background noise removal, to further improve the meeting experience for your users while further reducing the costs of that production.
So what type of AI models actually exist out there that you can run in the browser today? Clearly, not everyone here is going to be running video conferencing apps. So there are thousands of models out there, and many of these can be used in just a few lines of JavaScript, and many new models are coming out every single month from the wider web AI community. I can't cover all of these in the talk, but let's go through a few popular ones and think about how you can apply these to your industries that you're working in currently.
First up, we've got object recognition. This allows you to understand where in the image the object is located, but also how many exist, which is much more powerful than image recognition alone, which will tell you something exists but not where or how many. If you can use a model like this, you can create something like a pet camera very easily that can notify you when your dog has been naughty eating the treats when you're out of the house. I made this example in about one day of coding using off-the-shelf models that are capable of recognising 90 common objects like dogs and treat bowls and stuff like that. As you can see, it works pretty well. The dog in this video is caught red-handed as he tries to steal the bowl of food when I leave the situation. You can send yourself an alert using your regular web technologies to alert your devices, and all of this is done without having to stream that video 24 seven to some cloud server to interpret. Instead, you just send the clip when the dog intersects the bowl in that moment.
4. Advanced AI Models for Web Applications#
Text toxicity detection model can filter toxic comments in web chats or commenting systems. 3D data extraction from 2D images enables lighting effects and augmented reality. Face mesh model recognizes 468 facial landmarks and can be used for real-time lip coloring. Hand tracking models enable gesture recognition and touchless interfaces. Pose detection, body segmentation, and specialized models for upper body focus are also available. Large language models can be brought to the client side for fast processing.
What about text toxicity detection? Here, you can pass a sentence to the model and it will classify if it's toxic or not. I can even figure out what type of toxic it might be, maybe an insult or a threat, for example. Something like this could be used to pre-filter comments in a web chat or commenting system on your website, automatically holding potentially toxic comments for review by a moderator.
We can even extract 3D data from 2D images with this selfie depth estimation model. You can pass it a portrait image as an input and it will predict for every pixel in that image how far away it thinks it is, allowing you to understand the 3D make-up of someone's upper body. Now, if you can do that, you can do some really cool things like imagery lighting, as you see here. Here, I can cast the light rays around my body for a more realistic effect than was possible before, which can be great for augmented reality or lighting effects on images as shown in this example.
Next, we've got the face mesh model. This is just three megabytes in size and can recognize 468 facial landmarks on the human face. As you can see, it's running well over 130 frames per second on my laptop at home, and we're seeing companies like Modiface, who's part of L'Oreal, use this technology in real production web apps. Now, the thing to note here is that the lady on the right-hand side is not wearing any lipstick. Instead, they're using our face mesh model combined with WebGL shaders to color in the lips in real time based on the color swap she's chosen at the bottom of that web app.
We've also got models to track your hands that can track in 2D and 3D, and again, over 120 frames per second on a mid-range laptop with a dedicated graphics card. A model like this could be used for gesture recognition or even touchless interfaces for human computer interaction. With a bit of extra training, you could even do simple sign language detection if you wanted to. Pretty handy. Continuing on the human body theme, you can also detect 2D and 3D estimated pose. 33 key points at over 150 frames per second on a machine with a dedicated graphics card. This model has three forms, a light, a full, and a heavy version that allows you to select a trade-off between accuracy and speed depending on your needs. What about building on body segmentation? A model like this can allow you to differentiate the human body from the background of the scene. If you can do that, you can blur the body for privacy like we do in Street View, or maybe you can apply effects to the background instead to give some stylistic effect. The choice is yours. There are also specialized versions of this model that focus on the upper body, as you can see here, which is better suited for video call situations. Next up, you can even bring large language models to the client side. This MediaPipe LLM task API allows you to choose from four popular models, depending on your needs and use case, and they can run really, really fast, several times faster than the average human can read. I think I skipped one. No, I didn't. You might be wondering at this point, how hard is that to actually use? It's pretty straightforward.
5. Using AI Models with JavaScript#
Import the MediaPipe LLM inference API, define the hosted language model, load and use the model, generate a response, and utilize the models for various tasks. Explore different models for different situations using Visual Blocks, a framework for faster prototyping with AI-powered ideas.
No, I didn't. You might be wondering at this point, how hard is that to actually use? It's pretty straightforward. It fits on a single slide, so let's walk you through this code.
First, you import the MediaPipe LLM inference API as you can see here. Next, you define where your large language model is hosted. You would have downloaded this from one of the slides on the previous link on the previous slides where you downloaded the model and hosted it on your own CDN.
Now you can define a new asynchronous function that will load and use the model. Inside this, you can specify your file set URL that defines the MediaPipe runtime to use and this is using the default one that they provide and host for you which is safe for you to use too. However, if you want to, you can download this file and host it on your own server for your own privacy reasons, if you prefer.
Now you can use the file set URL from the prior line to initialise MediaPipe's file set resolver to actually download and use the runtime for this generative AI task you're about to perform. Next, you load the model by calling LLM task create from model path to which you pass the file set and model URL you defined above. As the model is a large file, you must wait for that to finish loading after which it will return the loaded model which you can then assign to a variable called LLM.
Now that you've got the model loaded, you can use it to generate a response given some input text as shown on this slide which you store in a variable called answer. With that, you can now lock the answer, display it on screen, or do something useful with it as you desire. That's pretty much it. Now just call the function above to kick off the loading process and wait for the results to be printed.
Now, the key takeaway here is that while there are some scary-sounding variables like file set resolver, anyone here could take and run this ten lines of code or so and then build around it with your own JavaScript knowledge for your own creative ideas, even if you're not an AI expert yet. So do stop playing with these things today.
Now, you can imagine turning something like this into a browser extension whereby you could highlight any text on the web page, right-click, and convert a lengthy blog post in a suitable form for social media, or maybe define some word you don't understand all in just a few clicks for anything you come across instead of going to a third-party website to do so. In fact, I did that right here in this demo, again, made in just a few hours on the weekend, entirely in JavaScript, client-side in the browser. So there are so many ideas waiting to be created here, and we're at the very beginning of a great adventure with much more waiting to be discovered.
In fact, people are already using these models to do more advanced things, like talking to a PDF document to ask questions about its contents without having to read it all yourself, as shown in this demo by Nico Martin. This is a great time-saver, and it's a really neat use of large language models when combined with surrounding rag techniques to extract sentences that matter from the PDF, and then use those as a context to the LLM to answer from to provide something that actually is meaningful. Again, this is all working locally on your device.
Okay. So you've got all these AI models that make you feel like a superhero, but how can they actually help you? Well, by selecting the right models for the right situation, you can provide your customers with superpowers themselves when you apply those models to their industries. In fact, at Google, we often need to explore what models to use for a given task. So we've created a system called Visual Blocks to do that in a more efficient way. It's a framework I worked on with my team that allows you to go from an AI-powered idea to a working prototype faster than ever before, built around JavaScript web components, so it's super easy to extend by anyone who knows JavaScript.
6. Building Prototypes with Visual Blocks#
A framework to go from AI-powered idea to prototype faster, chaining powerful blocks in Visual Blocks. Explore client-side models in Visual Blocks contributed by Hugging Face. Future possibilities with large language models in Visual Blocks. Examples of using AI and JavaScript in products and services. Utilizing AI models to guide predictions with TensorFlow.js. Control animatronics using face mesh technology. Improving pharmacy backrooms with Roboflow. AI-generated duet in Mirror Exercise.
It's a framework I worked on with my team that allows you to go from an AI-powered idea to a working prototype faster than ever before, built around JavaScript web components, so it's super easy to extend by anyone who knows JavaScript.
And even better, once you make a building block to do one thing, you can then chain them together with other powerful blocks to bring your idea to life without getting lost in the code.
And it's not just us using this. The example shown on this slide, we've actually contributed by Hugging Face, who have just been adding new client-side models via transformers.js, and those are now available to explore within Visual Blocks in just a few clicks, too, which is great if you're new to the AI space in the browser.
You can try them out for yourself at the link shown, just click their names and they'll launch in Visual Blocks to use immediately. And here, you can see a glimpse into the future, where a research team at Google used a large language model to create a Visual Blocks pipeline for me to solve some problem I had, like creating an augmented reality effect.
Just by explaining what I want in plain English, and have it make the start of a prototype for me all within Visual Blocks, and then I can edit that as I need to punch it up and deliver to a client.
That's really incredible stuff. Now let's head on over to see what the web AI community have been up to. These are people just like you but already using machine learning and JavaScript in their products and services to give you a taste of what's possible.
Fundamentally, what we're trying to do is build tools to allow anybody to be their true self. There's a lot of people that feel uncomfortable on camera. There is like a much more detailed side by side view of my face being captured alongside the face of the 3D character. So what's happening is it's tracking the individual facial lens shapes, and then relaying them back into that Unreal Engine to render them.
I work as a radiologist and the last four years I've been really interested in using artificial intelligence for segmentation. This is a really nice thing with TensorFlow.js I think is that you can interact with the AI models. It doesn't have to do fully automatic predictions but you can actually kind of guide the models to get the result that you want.
We're going to see an animatronic talking backpack that uses face mesh and it was sort of a comedian in the wild experiment that respected social distancing. It does what you see. It puts all these tracking points on a face and it can track it with and without glasses. It has an incredibly high frame rate, even in JavaScript and even in browser. So wouldn't it be great if you could use this technology to control animatronics?
This is an example of how Cardinal Health is using Roboflow to improve the backrooms of pharmacies. So they have a division of their company that works with pharmacists where one pharmacist can manage several locations remotely over a video stream. And so this runs in the browser on an iPad and previously it was just a video chat between the pharmacist and the technician and then they used Roboflow to add on a pill counting feature that helps the pharmacists be augmented by the computer vision model so they don't have to count from scratch. They can use the model to estimate and then adjust up and down where the model has not gotten an exact count. The video that I'd love to show here is a video piece that I constructed which I call Mirror Exercise. And this is an AI-generated duet with myself. So what you're seeing here is in the figure in black is my real motion capture data from a couple of data taking sessions in the studio. And in blue you see a dancing accompaniment that is generated by the model. So in each case the model is seeing a certain segment of my movements and is creating a slight variation on it in some sense.
7. Exploring AI in Web Applications#
A hand-tracking engine called Yoha allows you to quickly sketch on a whiteboard. Teemo is a visual workflow engine for building complex solutions without coding. Expanding into computer vision to deliver care directly to patients' homes. Many AI models are useful for real business use cases. AI in the browser is in its early stages, enabling businesses to reimagine web experiences. A hybrid approach can be used to run models on client devices. Share your custom results using the hashtag webAI. We're working on a website for web developers using AI in JavaScript. Join the adventure of shaping the future with web AI.
It's a hand-tracking engine called Yoha. And so what it does is it processes the video feed of your webcam and it detects the location of your hand in this video feed. And the goal of it is that it allows you to quickly sketch something roughly similar to how you would do it on a whiteboard.
Teemo is a visual workflow engine that allows you to build very complex solutions in just minutes. And we do that through a drag and drop interface which helps you define any kind of process without the need for coding. So this is the final version of the program. The video's a bit sped up so things happen fast but you can see that as water is taken out from the container the graph reflects the changes in the level of the liquid which is what we wanted.
We wanted to go beyond the machine and even beyond the clinic. And so that's really when we decided to expand our platform into computer vision and delivering care directly into patients' homes on their own devices. We're a medical device that we can run on any device fully web-based and you can see these patients doing different exercises, upper body, lower body, with real-time feedback, counting their reps. We track their range of motion, we gather all that and send it back to the clinician for review to help modify their home exercise plan accordingly.
All right, so some really cool examples there and notice not one of them used Gen AI. So my point is that there are plenty of other AI models out there that are really useful for real business use cases beyond just Gen AI as well, so do keep that in mind. And all of those are running client-side in the browser, even the talking backpacks, so you can talk to hardware and other things too. So from browser extensions that supercharge your productivity to features within the web app itself, we're at the start of a new era that can really enhance your web experience and the time is now to start exploring those ideas.
In fact, right now, AI in the browser is in its early stages, but as hardware continues to get better, we shall continue to see more models ported to run in the browser on device, enabling businesses to reimagine what you can actually do on a web page, especially for industry or task-specific situations. Right now, one could envision a hybrid approach, whereby if a client machine is powerful enough, you can download and run the model there, only falling back to cloud AI in times when the device is not able to run the model, which might be true for older devices at this point in time. With time, though, more and more compute will be performed on device, so your return of investment should get better as time goes by when implementing an approach like this.
On that note, if you make something custom, we would love to see your results, so please use the hashtag webAI on social so we can start to share things as a community to see what everyone's up to. When you win, we can help share your wins as well. On that note, we're working on a website to provide guidance specifically for web developers choosing to use AI in JavaScript. This site allows you to understand key AI concepts so you can discover opportunities to use popular models to be more productive than ever. Feel free to bookmark this site as we will continue to publish content here over the year.
Now, I would like to end by saying there are very few opportunities in one's life to be at the beginning of a new era like this. I, for one, remember when the internet came out for the first time, and the web AI had the same feels as that but only magnified 100x. I hope in this talk I've been able to provide you a glimpse into the future, and why you should start investigating web AI today, because everyone here in this room has a chance of a lifetime to shape the future of this fast-growing space, and I hope you will join me on this adventure. On that note, if you want to go deeper in the subject area, you can check out my course on google developer's YouTube channel to learn how to use web AI in your next creation. All you need to know is JavaScript, no PhD required, so invite a friend, and go learn together at goo.gle. With that, that's a wrap. Do tag me in anything you make, or if you have any suggestions on anything I've said in this talk today, you can connect with me on social using the links on the slide.
QnA
Privacy and Model Usage in Web AI#
Google does not benefit from your data. Large language models can be about 1.3GB to download, but caching can reduce future downloads. Chrome is exploring options to bake models into the browser itself. Open source alternatives are available for transfer learning, such as YOLO for object detection.
Thank you, and see you next time!
The first one might be a little bit cheeky. It says how is Google going to benefit when they don't get your data? If we are doing this all on device ... Yes, so, all of this is happening on device, so we are not benefiting from your data in that situation. Everything is completely private. The models are open to use and are available on our GitHub to inspect if you want to check the code that is running behind the scenes, so you're in full control of the data if you choose to send it to some cloud server or not. That's up to you, but you will be under the terms and conditions of that cloud service that you're using, whoever that might be.
I feel like every time I hear of a Google thing that works in the browser, people always come back with, if it works for the web, it also works for Google, right? I see. Cool. A question that has raced to the top here, which is if you are loading an LLM locally in the browser, how heavy is your site going to become? Yes, that is a great question. Obviously, large language models are large, so you're looking at about 1.3 gigabyte download, which you do once, and then you can actually use caching to not have to do that again in the future, so, if something provides utility to the user that could be long-living, like in a Chrome extension, or something that is used on your site regularly, that could warrant the download. We're also exploring options to bake these things into the browser itself within Chrome, which we announced at Google IO this year, which would mean once you've downloaded it once, it could work for multiple sites, not having to cache it on every single domain that needs it. So, we're investigating this, and we welcome suggestions as well on how you would like to see these APIs evolve in the future.
Is there a way to get it in fact? Is it coming in Chrome 128? Gemini Nano in Chrome 128? Was that what was announced? I think they announced a bit of a nano inside, exactly, yes. I think it is on Canary right now maybe to explore. Yes, yes. Excited to explore that kind of thing. Cool. That's the same question. A question on transfer learning. Are you planning to publish more models layer instead of the frozen graph model for transfer learning? Good question. So, right now, a lot of our models are frozen, as you correctly stated. We don't have any plans to change that any time soon, but there are plenty of open source alternatives that are open to use and do your transfer learning as you see fit. Things like YOLO for object detection, for example. It's very easy to convert to TensorFlow.js and run in the browser recognising custom objects, for example. So, I would say, as of this year and 2023, it's been a huge growth in the number of open models that are now available to the world, and we can leverage those in the browser, too. It's not just the ones I spoke about today.
Model Compatibility and Web AI#
YOLO for object detection can be converted to TensorFlow.js and run in any browser. Web AI works in any browser that supports WebAssembly or WebGPU. It can even be run on mobile phones and Chromium-based browsers.
Things like YOLO for object detection, for example. It's very easy to convert to TensorFlow.js and run in the browser recognizing custom objects, for example. So, I would say, as of this year and 2023, it's been a huge growth in the number of open models that are now available to the world, and we can leverage those in the browser, too. It's not just the ones I spoke about today. Obviously I got my reference points from Google, but there's plenty of others out there, too, that you can also leverage.
Is Web AI, which you've been talking about here, is that Chrome-specific or is that going to work in any browser? It will work in any browser, so long as it supports most of these models are accelerated by a WebAssembly or WebGPU. So, depending on the type of model it is, it might use one or the other, but as long as those two things are supported, then it should work in those browsers. So I could even run these things on my mobile phone, even on, you know, Chromium-based browsers and things like that. So, yeah.
Web AI Compatibility and Learning Resources#
Web AI works in any browser that supports WebAssembly or WebGPU. The future of WebNN is being discussed and community feedback is welcome. Minimum system requirements depend on the model being run, but basic tasks like object detection and segmentation can run smoothly on older devices. For beginners, there are books like 'Deep Learning in JavaScript' and 'Learning TensorFlow.js', as well as online courses available.
Ah, this is an interesting one. Is Web AI, which you've been talking about here, is that Chrome-specific or is that going to work in any browser? It will work in any browser, so long as it supports most of these models are accelerated by a WebAssembly or WebGPU. So, depending on the type of model it is, it might use one or the other, but as long as those two things are supported, then it should work in those browsers. So I could even run these things on my mobile phone, even on, you know, Chromium-based browsers and things like that. So, yeah. Very cool.
And you said, yeah, we've done WebGPU and WebAssembly. Is there, I've heard of, is WebNN, Web Neural Net? Is that coming as well? We're also contributing to that Web standard, of course, so that's an ongoing discussion. I think right now, a lot of the community are deploying their models to WebGPU because available now. It's been published as a standard. And it's still being baked, I guess. So, again, this is a great time to start shaping those standards so we can actually have your input on those of what you'd like to see in these APIs in the future that make your lives easier as developers, and we welcome feedback on that. It's an open thing to be commenting on right now on the GitHub issues for that. Awesome. Yeah, if you're interested in this, get looking in the future of WebNN as well as the existing things we have, yeah. All right.
Oh, are there any, what are the sort of minimum system requirements for running on a browser? It really depends what model you're running. So, if you're doing something like our segmentation models or pose estimation models, they can run on a laptop from ten years ago just fine. If you're trying to run a large language model in the browser, then you do need a certain level of graphics card with enough video RAM to run that model. These larger models do take more resources, so you need a more modern machine for those, but some of the bread and butter stuff like object detection, body pose, segmentation, they can run a buttery smooth 60 frames per second even on older devices in a lot of cases. 150 frames per second? 150 on that. The 150 was on an NVIDIA 1070 which is five generations old, just for context, yeah. Nice.
Do you have any recommendations for learning this for complete beginners? Like where would you start? Well, obviously, my course, but also there are plenty of books out there as well. There's like deep learn, deep learning in JavaScript and that was written by some of the folk on the TensorFlow.js team, I believe. There's also learning TensorFlow.js by O'Reilly which was written by one of our GDEs and that's a really easy to understand book if you want to just get started with TensorFlow.js. There's also now many online courses that you can take from other people as well to learn other ecosystems like Transformers.js and Onyx Web Runtime and all these other varieties of Web AI essentially. So pick one and go have a go basically. Yeah, yeah.
TensorFlow.js Course and App Categories#
The TensorFlow.js course covers a broad introduction to AI before diving into TensorFlow.js specifics. TensorFlow.js can be applied to various industries and offers models for tasks like data analysis, vision, text understanding, audio generation, and speech recognition. It prioritizes privacy and is valuable in verticals such as legal and healthcare.
Yeah, yeah. Loads of resources. But your course is a great start as well. I try to cover a bit of everything at the beginning and be agnostic just to teach you what AI is and then I go to the TensorFlow.js stuff. So the beginning parts are useful for everyone no matter which route they take later and then later in my course, it gets more TensorFlow.js specific but you can then choose to skip it or go wherever you want. Nice.
All right. I sort of covered that as well. All right. And we covered that a bit. Oh my goodness, sorry. Is that it? Yeah. We sort of... Yeah. Large things. We might be able to catch them across sites. That's exciting. Yes. Oh yeah.
So you said the TensorFlow examples were not generative AI. That's correct. So how would you categorize what it means regarding the types of apps that it's useful for? I guess the TensorFlow stuff. How would you categorize what it means regarding the type of apps it's useful for? What can it be useful? I mean, it could apply to every industry out there, I think. There's so many models out there that from filling in missing columns in spreadsheet data to the vision models, text understanding, audio generation or speech recognition. Everything is available for you to run in the browser these days. So it really could apply to anything. And then it comes with that local first kind of privacy first kind of thing as well. Yes. It's been a bit of a running theme today. And for certain verticals, that's really important, like legal or healthcare.
Improving Accessibility with Web AI#
Web AI can improve accessibility by using models to automatically fill captions for images without alt captions. This feature has positive benefits for users who may have neglected to provide captions. Considering accessibility alongside new features is crucial.
That's why so many healthcare examples there as well.
Yeah, exactly. Do you think that web AI can improve accessibility?
Yes. Someone says maybe eye tracking or similar technologies. That's a great question. And a good example of that would be browsers are starting to use web AI models to automatically fill captions for images that don't have alt captions. So already it's starting to have positive benefits for people who have just not bothered to do certain things and it fills it in for you.
Yeah, amazing. I love that. That's such a great question because, yeah, I think it's a nice question to end on that. We've run out of time here, but it's a great question because thinking of the users via accessibility as well as just via cool kind of new features that we can build is really important.
I totally agree. Yeah. So thank you so much, Jason. That was absolutely brilliant. And yeah, give it up for Jason. Thank you. Thank you. Thank you.
Table Of Contents
1. Introduction to Web AI in JavaScript2. Advantages of Client-side AI in Web Applications3. AI Models for Web Applications4. Advanced AI Models for Web Applications5. Using AI Models with JavaScript6. Building Prototypes with Visual Blocks7. Exploring AI in Web ApplicationsPrivacy and Model Usage in Web AIModel Compatibility and Web AIWeb AI Compatibility and Learning ResourcesTensorFlow.js Course and App CategoriesImproving Accessibility with Web AIAvailable in other languages:
Check out more articles and videos
We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career
Workshops on related topic
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)
After this session you will have insights around what LLMs are and how they can practically be used to improve your own applications.
Table of contents: - Interactive demo implementing basic LLM powered features in a demo app- Discuss how to decide where to leverage LLMs in a product- Lessons learned around integrating with OpenAI / overview of OpenAI API- Best practices for prompt engineering- Common challenges specific to React (state management :D / good UX practices)
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps
Comments