English versionEN

Practical Web AI: Built-In, Browser Based, Brilliant

AI progress is running away on the server-side, where the GPUs live, but the web platform is never that far behind. In this talk we're going to explore the incredible applications you can build in the browser today and in the near future.

You'll see how to build an offline-capable, speech to speech language translation app, a babelfish, using nothing but the web platform. Through this app we'll explore the different ways we can run AI models in the browser today and what may be built into the browser in the future. From low-level standards to high-level experiments, you'll learn how to work with AI on the web.

This talk has been presented at JSNation 2025, check out the latest edition of this JavaScript Conference.

FAQ

Phil Nash is a Developer Relations Engineer at Datastacks.

LangFlow is a drag-and-drop builder for creating generative AI flows. It is an open-source application written in Python.

The Web Speech API is used for speech-to-text functionality inside the browser.

No, the Web Speech API currently sends data to Google's Cloud Speech Recognition API, which requires an internet connection.

The Prompt API is an experimental feature in Chrome that allows developers to use language models, like Gemini Nano, for tasks such as writing and translating text directly in the browser.

Specialized APIs mentioned include the translator and language detector, summarizer, writer, rewriter, and proofreader.

The Translator API allows for language translation directly in the browser without needing network connectivity. It is part of Chrome's experimental APIs.

Yes, certain AI models can be used offline in the browser, such as the Translator API and the multimodal prompt API for specific tasks like translation and transcription.

The multimodal prompt API is a new experimental feature that allows audio transcription and other tasks directly in the browser without internet connectivity.

Datastacks is building a generative AI platform for building applications, which includes a vector database called AstroDB and a drag-and-drop builder for generative AI flows called LangFlow.

artificial intelligence webgpu

Phil Nash

30 min

12 Jun, 2025

Comments

Video Summary and Transcription

The Talk delves into the integration of generative AI tools in web development, emphasizing the AI revolution's impact. It explores creating a browser-based translation application without backend servers, emphasizing speech-to-text and translation APIs. The discussion highlights browser-based speech capabilities, different voices, and the challenges of translation within the browser. The exploration of the prompt API, Gemini Nano, and specialized APIs showcases experimental features and language model capabilities. The advancements in browser-based AI, privacy-focused AI usage on Chrome, and the utilization of Langflow for server-side generative AI experimentation are also discussed. Progressive enhancement, mobile integration, real-time translation, privacy concerns, and model integration into browsers are key topics.

Available in Español: IA Web Práctica: Integrada, Basada en Navegador, Brillante

1. Discussion on AI and Web Development

Short description:

The speaker, Phil Nash, talks about code and an application integrated with generative AI tools from Datastacks, emphasizing the importance of the AI revolution and its impact on web development.

It is wonderful to now be talking to you a bit about some code and some stuff, rather than just our other wonderful speakers. If you saw Thomas Steiner's talk earlier today, I'm going to be talking a little bit about some of the same stuff there, but I've built it all into one other big application. It's not going to be live-coded, because he chose an interesting path there, a dangerous path. Mine is just demos that might break anyway, so let's see how that works.

As we said, my name's Phil Nash. I'm a Developer Relations Engineer at Datastacks. Datastacks is building a generative AI platform for you to build applications on. We've got a vector database called AstroDB. We have got a Gooey builder, a drag-and-drop builder for generative AI flows called LangFlow, and I recommend you check out both of them. If you want to find me anywhere online, I'm Phil Nash all over the internet.

There's a bit of a warning at the start of this, that while I said practical in the title, much of this is running in browsers, but browsers that people aren't using. So everything in here is Chrome Canary today, and while some of the applications, some of the APIs that we'll see are coming to actual Chrome very soon, it's certainly not a cross-browser thing, and it's a very new and emerging thing. And that's because the AI revolution is happening. I know there's been a bunch of mentions of AI in other talks that weren't even about AI today, but the AI revolution is happening, and it's making a lot of interesting things happen. I'm excited about it, because I'm excited about what we can build with it, as well as some of the stuff making it easier for us to build.

2. Exploration of Browser-Based Babblefish

Short description:

Phil Nash discusses the intriguing possibilities of web development, focusing on creating a browser-based translation application without backend servers for personal use during a holiday in Spain.

What we can build, I think, is really interesting. And what we can build on the web is even more so. What we can give to people inside of their browsers in a privacy-focused, in a secure kind of context is even more incredible, and that's what we're going to see today.

So I did have a different title for this talk, but it made less sense when you actually put it on a schedule. I wanted to call this the built-in browser-based Babblefish, because I like alliteration, but also because I wanted to build some sort of Babblefish, and that's taken from the Hitchhiker's Guide to the Galaxy, as the idea of a fish that was able to live inside the ear canal of people and translate what other people were saying directly into speech that they heard in their head.

I had to get some representations of what this might look like, and so I asked Gemini, and this was weird. I particularly liked the border on the ear here. That's super strange. AI stuff. Don't take pictures of it. It's weird. This was OpenAI's attempt, which is actually quite good, because you get the little appendage down the bottom there where it hooks into the brain there. That's weird as well, right? It's a Babblefish. It's going to translate stuff, and it's going to do so because I am going on holiday to Spain this summer, which I believe is what that says. I learnt Spanish at school, which means I know no Spanish anymore.

3. Building Web-Based Translation Application

Short description:

The speaker explores building a translation application using only the web platform and browser, emphasizing the need for speech-to-text capability and translation APIs. Demonstrating the effectiveness of language models like LangFlow for seamless translation experiences.

I'm very excited about that. I live in Australia. It's as far away from talking Spanish to people as you possibly can, but I am lucky enough to be going to Spain on holiday, so I was like, can I build something that's going to allow me to translate my way through both English to Spanish and Spanish back to English, so I know what I'm actually talking about whilst I'm away? That's the idea. We'll see how we get.

The question becomes, can you build a translation application using just the web platform, just the web browser itself, no backend servers, nothing else? Yes. What do you need for this? You need to be able to do speech-to-text. You need to be able to turn what I'm saying into text that we can then translate somehow and turn back into speech. Now, ideally, actually, this would all just be one big speech-to-speech model, but those don't really exist yet, so this is kind of the pipeline of stuff that we are going to go through in order to achieve this. Traditionally, the browser has had a bunch of APIs that are really useful for this. We have the web speech API which does speech-to-text inside the browser.

You would need some sort of translation API, something normally on your server side to do something about that. As an aside, of course, large language models, being that they are all about language, are really good at translation. I just wanted to give a quick demo of that. This is LangFlow. This is the kind of general AI builder that I work on. It's an open-source application. It's Python, so you can just pip install it. This is almost like the simplest of flows you can ever do. We have a prompt coming in saying you're a language translation expert. It's got a couple of examples like if the input's English, output Spanish. If the input's Spanish, output English.

Then you can just talk to it and say whatever you want. Hello, I am having a great time at the conference. Conference. That's just going to go through Gemini in this case. It says, hola, me lo estoy pasando muy bien a la conferencia. Excuse my accent. I still have to talk. Point being, Gemini and other models are really good at translating without even having to try. That's cool.

4. Exploring Browser-Based Speech Capabilities

Short description:

The speaker delves into the capabilities of the web speech API for text-to-speech and speech-to-text functions, highlighting the ease of speech recognition and browser-based speech synthesis.

You might pass it off to LangFlow in the background in order to do this translation. What does this web speech API give us to start with? Let's talk about that text-to-speech, speech-to-text side of things. Speech recognition, super easy. If you've not seen this before, you just get yourself a speech recognition object. You listen for results, and then you start it. You do have to tell it what language you need to listen to. But it just kind of works. I have this running in my browser. My slides are running inside of canary as well. This doesn't need canary. This works in regular Chrome.

Hello, everyone. This should just start showing up on screen. And what's quite nice about this is you get interim results. And those interim results come out in italics. When they go to not italic, it means it's kind of the finalized result. That's quite good, right? I'm going to stop this now before it transcribes my entire talk then. That's kind of cool. That exists in browser today. Now, it doesn't work in Firefox because they don't have a service to do the translation. But maybe one time in the future they will.

On the other side of things, making the browser talk, also super easy. You do have to use such ridiculous APIs as the speech synthesis utterance, which is very hard to say. I've been in a room saying that over and over again in order to make sure I could do this for you today. Speech synthesis utterance. You give the utterance a language. You can actually give it a choice of voices, which you'll see in a minute. And then you just tell it to speak the utterance. Now, the weird thing about this is it actually uses the operating system underneath. And so I'm using a Mac and we're getting a bunch of Mac voices available to us.

5. Exploring Voices and Translation Challenges

Short description:

The speaker explores different voices in the context of browser-based speech capabilities and discusses the challenges of translation within the browser, highlighting the evolving role of AI on the web.

You will see different voices if you're using Windows or something else. But here is an example of this. I'm just going to use the headphone so I can hear what it's saying as well. It's so good to spend time with you all at this conference. Right, a nice Australian voice there for you. Now, I want to tell you something. We get some really weird stuff in here because there are so many voices. But my absolute favorite weird one is Bubbles.

Now, I want you to help me out a little bit here because I'm going to play this again in the Bubbles voice. And I would like you to all say when it's finished, that's so weird. Mainly because I want the rest of the conference to be like, what is going on up there? But also, you'll agree with me, it's really weird. And so good to spend time with you all at this conference. That's so weird. Why does that exist? I don't know. I'm going to move on.

Because translation is the thing that we are missing inside the browser. That's what we don't have right now. And so that's where you would traditionally go off to a server to do this translation and come back. Now, that's going to take time because you're going over the network and it is sharing whatever somebody is saying. If you're trying to translate into another thing, off to a third party server somewhere probably. So that brings us to AI on the web. Like I said, if you saw Thomas' talk earlier, you now realize that Chrome can do some of this stuff. It starts with the prompt API, the first API they experimented with. The prompt API is still not slated to come into a browser anytime soon but is making its way into extensions, which is useful for building and testing.

6. Exploring Prompt API and Gemini Nano

Short description:

The speaker discusses the experimental nature of the prompt API, its integration into extensions, and the ease of building with language models. They demonstrate prompt usage for poems and highlight the capabilities and limitations of Gemini Nano, a small but powerful language model.

And it is the experimental one still. Like the prompt API is still not slated to come into a browser any time soon. It is making its way into extensions. Which is kind of useful. You can still build stuff with it and get users using at least an extension. It is super easy to set up. You just call on language model.create and wait for that to happen.

And then you said once you've got the model, you just send it a prompt. It's that easy. In fact, we can do that right now. I said in my slides. Write me a four line poem about cats. Fluffy fur and emerald eyes. A hunter's heart. Independent spirit. A furry friend who knows what to do. We lost the rhyme at the end, didn't we?

What we have here is Gemini Nano. I want to do one more. I was like, write a haiku about JavaScript and Amsterdam. It will try. Coffee shop. Vibrant show. Tech hub. Probably not the right number of syllables. It is not good at doing haikus. It is because we have Gemini Nano. It is a small, large language model. If you are at all used to using the frontier models from Google with Gemini or Open AI, you are expecting a lot more out of them. When you go to Gemini Nano, it is amazing. It is a step back six, 12 months.

7. Technical Requirements of Prompt API

Short description:

The speaker discusses the technical requirements and limitations of the prompt API, including storage space and VRAM needs for the models, emphasizing the availability status and challenges for different devices.

Everything you might have learned from prompt engineering applies again. It is harder, but it is going to be built in. It does have these other requirements. It only works on a Windows machine or Mac or Linux. It is not on mobile or Chrome OS. Apparently, it needs about 22 gigabytes of storage space to be available.

The model is only about two gigabytes in size, but it requires this amount of storage space. More than four gig of VRAM. And I had to check my notes. The Google documentation still says four, so my slides still say four. The point is, that is not going to be everybody's device. Definitely not.

And if it is the first time this gets run, you have to go and get yourself a two gigabyte model, which is horrifying in a way. There are a couple of extra APIs around it. You can call on this availability section on the language model. That will tell you whether it is unavailable, downloadable, downloading, or available. Unavailable is like, this does not exist for you. You cannot do this on your device. Downloadable means, sure, but you have to wait for two gigs worth to arrive. Downloading is on the way. And available is, you are ready. You can go.

8. Exploring Specialized APIs of Prompt API

Short description:

The speaker talks about the capabilities of the prompt API, its controversy due to Gemini Nano, and the development of specialized APIs like translator, language detector, summarizer, writer, rewriter, and proofreader for targeted focus tasks.

It is cool, you can pass in that create method, you can pass a function to monitor the download and kind of make a user aware of, like, there is a download going on before you can use this particular feature, perhaps. Otherwise, very cool.

But then, then, because this prompt API is kind of a bit controversial, because it is using Gemini Nano and if another browser were to implement it, which Edge has been working on with their Fi4 models, models can behave differently to prompts. It is controversial as to how this is necessarily going to work. That caused the team to work on to look at what people were experimenting with using the prompt API and then build smaller models that are able to do just kind of targeted focus things, solving common problems that people were trying to achieve.

This gave us specialized APIs like the translator and language detector, which we are going to use. The summarizer, which Thomas had some fun with earlier. And then there are other kind of newer ones, like the writer and rewriter, which are for editing, and then proofreader, again, also for editing content. Translator, language detector and summarizer are coming to Chrome 138 and in the beta right now.

9. Browser-Based Language Tools

Short description:

The writer and rewriter are in trial, while the proofreader was recently announced. The translator is user-friendly for quick translations. The language detector ensures correct voice selection for speech and detects multiple languages with confidence levels.

The writer and rewriter have been in origin trial, and the proofreader was announced at Google I.O. a few weeks ago. I don't know if it exists yet, and I haven't tried it. The translator is nice and easy, where you create a translator with a source language and target language to translate content. It's quick and useful for communication in different languages. The browser-based translation is efficient and doesn't require network connections.

The language detector is essential when converting text to speech, ensuring the appropriate voice is selected. It provides a confidence level percentage for language detection, helping in choosing the right voice for speech. It can detect multiple languages and indicate confidence levels for each. Even with slight errors, the detector can identify potential languages, such as Spanish or Portuguese.

Language detection avoids mismatches like using an Australian accent to read Spanish text. The detector's create and detect functions facilitate language identification. It offers confidence levels for language recognition, even for ambiguous cases. This feature enhances text-to-speech processes by ensuring accurate language selection for speech output.

10. Browser-Based Babelfish Speech Interaction

Short description:

The language detector ensures the right voice for text-to-speech. It offers confidence levels for language recognition. It can detect multiple languages. A browser-based Babelfish uses the Web Speech API for translation and speech. Selecting voices like Monica enhances the experience.

Excellent. This is just in the browser. This is not hitting any networks or doing anything. And useful to us is the language detector because when we're going to turn it into speech, back text to speech, we need to pick the right voice for that. We don't want to pick an Australian accent to read out things in Spanish. That's a bad thing.

The language detector has the same create function and it has a detect function. And we'll show you a confidence level. It gives you a percentage of how confident it is that it's this language. If I were to change that to beer is on my mind, apparently, it's going to think that's probably Spanish. One time I put in something slightly wrong. Maybe this is Spanish, maybe this is Portuguese. So that's interesting to note that it doesn't just do one language.

It will pick out a bunch if it's unsure. And so I built a full-on application to play with this. And that lives over here. This is my built-in browser-based Babelfish using the Web Speech API to do speech to text. It's using the translator API to translate from English to Spanish or the other way around. Then it will speak out in a voice. I'm just going to pick a slightly better voice than Eddie. I'm going to pick Monica. I'm going to listen to make sure this is working again. And talk to it. I'm having a great time talking to you all here at JS Nation. Nice, right?

11. Exploring Multimodal Prompt API

Short description:

The speaker discusses translating between English and Spanish using browser tools. Exploring the challenges of implementing web speech API for offline use. Introduction of the multimodal prompt API for audio transcription.

I'm having a great time talking to you all here at JS Nation. Nice, right? And now I would like a beer. Okay. It's not listening to me now. But I should do the other way around, right? I was hoping to get a nice easy sentence there. And that's coming out in Karen, I guess. Cool.

So we can translate English to Spanish, we can translate Spanish to English, and we can do so using our inbuilt browser stuff. But I did say kind of in the thing in my abstract that I want this all to be built into the browser and I want it to work offline. And the secret behind the web speech API and why Firefox doesn't implement it is because it is sending off to Google's Cloud Speech Recognition API in the back end.

Now, we get to use that for free, which is cool. But also it means that if somebody is getting something translated and they're using your application to do so, that stuff is going to get sent to Google. And so can we do this offline? Can we translate from speech, from audio into text without anything else? Again, very, very new. This is announced at Google I.O. again a couple of weeks ago, is the multimodal prompt API. Tom showed it earlier with image detection and I can show you today with audio work.

12. Advancements in Browser-Based AI

Short description:

Tom demonstrates the new and cool audio transcription feature in Chrome Canary. Offline functionality and user-generated content processing are highlighted as key benefits of browser-based AI for translations.

Tom showed it earlier with image detection and can now demonstrate with audio work. It is behind a flag in Chrome Canary, marking it as very new yet incredibly cool. Using the same language model with specified audio inputs, a transcription can be obtained by prompting to transcribe the audio buffer. Currently limited to 30-second audio bits, the system proves offline functionality by showcasing the inability to transcribe without an internet connection.

Utilizing media devices for microphone access, the process involves detecting audio chunks via the media recorder API, converting them into an audio buffer with the Web Audio API, then processing with the prompt API for transcription. The demonstrated translation capability from English to Spanish showcases the potential for built-in browser-based AI applications. Offline functionality offers numerous use cases, particularly in user-specific scenarios where browser-based AI provides cost-effective solutions compared to external APIs.

The ability to handle user-generated content, such as translations, summarizations, and language detections, directly in the browser signifies a significant advancement. This approach eliminates the need for costly external AI API usage, enabling platforms like Twitter, lacking translation API access, to leverage browser-based solutions for user interactions and content processing. The demonstrated potential of browser-based AI for translation showcases a shift towards more efficient and cost-effective user-specific content processing within the browser environment.

13. Empowering Browser-Based AI

Short description:

Using Web Audio API and prompt API for audio buffer processing and translation. Offline browser-based AI enables diverse user-specific applications.

We then turn that into an audio buffer using the Web Audio API that the prompt API can then deal with, and then we send it to it, and then after that, we tend to bring it back and speak out in hopefully Spanish on the other side.

What am I going to say? I've run out of things in my mind. Let's all go have a party after this conference. That seems good. That seems good. But also, I can say the same thing back the other way, right? Vamos todos a hacer una fiesta después de esta conferencia.

And it's going to detect that that was English. It's saying stuff, isn't it? Was it saying stuff? I don't know. All right, good. It detected using the language detector API that I then spoke Spanish to it, so it was able to translate that from Spanish to English and use the correct voice on the outside too. And so, this is the kind of thing we can build with a bit of built-in browser-based AI. It's all offline, and that is amazing.

14. Enhancing User Experience with In-Browser AI

Short description:

AI in-browser enables cost-efficient user-specific tasks like translations, summarizations. Personalized content summarization and translation improve user experiences and save costs. Offering alt text suggestions for images and on-device transcriptions enhance accessibility and privacy.

It's all offline, and that is amazing. And that's amazing for a bunch of use cases. If we have AI available to us in the browser like this, then a lot of user-generated and user-specific stuff can be done directly in the browser. Because if you were trying to do a translation or a summarization based on user preferences for everyone, that costs a lot of money with AI APIs and things like that. If you can do it in the browser, you're in a much better place.

If you're translating user-generated content, I always remember on Twitter, if you found a tweet in another language, there might be a little translate this link underneath. Now I like using Blue Sky, but that doesn't have that, presumably because Blue Sky can't just afford to have a translation API for everyone all the time. But they could build it with this. It could be personalized summarization of user reviews. You might see summarization of user product reviews on Amazon these days.

But if you said, if you were getting reviews for a pair of headphones or something, and you cared most about the bass, you could say, I care about the bass, summarize these reviews and what they say particularly about the bass in these headphones. And that's like, that is not something you could do at the scale, like personalized to everybody unless it's running on their device. I love the idea of suggesting alt text for image uploads. Every time you build an image uploader anywhere, just have one of these to be like, hey, did you want to put that alt text in? Because that is important for everyone. And then transcribing and translating videos, that's very expensive normally. And it's incredible what we can do with it now.

QnA

Utilizing Privacy-Focused AI on Chrome

Short description:

Privacy-focused AI usage on Chrome Canary. Experiment, provide feedback, and enhance applications. Access Chrome's built-in APIs for demos and code sharing.

And most of this is also down to privacy. Like none of this left my device after I turned to using the built-in AI stuff. And so that I think is really cool. I don't think I have to sing the praises of privacy too much to understand that it is a positive thing for our users.

This is all brand new. This is in Chrome Canary. But my goodness, it is time to experiment with it. It's time to try it out. Get yourself on the early access program with Chrome, with the Chrome team. Send them your feedback. Build stuff with it. And see what you could do to improve your businesses, your applications over time with this stuff.

I can't wait to see what you build with it. And I really want to. There are some links to stuff. So that's kind of the documentation for Chrome's built-in APIs. That's the best place to start to learn about how to actually go ahead and build all of these. They have a bunch of demos on GitHub, which is really cool. I will be sharing the code for my demo at some point. But currently it is in horridly cobbled together because things weren't working for quite a long time state. So once I clean that code up, I can share that to you.

Utilizing Langflow for AI Enhancement

Short description:

Langflow for server-side generative AI experimentation. User input validation and content moderation are valuable use cases for local browser-based AIs.

And then if you do ever need to progressively enhance, go back to the server at some point, I do recommend the open source project Langflow as a way to experiment with generative AI on the server as well.

So thank you so much. It's been an absolute pleasure talking to you again. My name is Phil Nash. I'm a developer relations engineer at DataStax. Thank you.

Do you think that user input validation or content moderation will be a valid use case for local browser-based AIs? If yes, in what time frame? I do think that's a great use case for it. Absolutely. If there are a particular thing that comes to mind is the ability to validate bad input.

If you have a comments page or a social media site or whatever, being able to detect whether this is toxic content is something that being able to do that in the browser saves you money. Doing that on the server is an option. But I think for right now, obviously, this is all a bit of a progressive enhancement if you do want to use it.

Progressive Enhancement and Mobile Integration

Short description:

Enhancing user experience through progressive enhancement with smaller focused models. Expectations for mobile phone integration and challenges in varying device capabilities. Potential for dedicated APIs and expansion of smaller models into browsers.

You almost basically can't at this point, but soon. But you can progressively enhance it by both using it where it's available in a browser and then going back to a server if you need a server for this as well. Given that we are running with small and focused models, that should still be relatively cost effective on the server as well. But yeah, I do think that's a great use case, that input validation is a great use case, particularly for stuff that you couldn't otherwise validate with a rule or a regular expression. It's not use regular expressions. No, but it's really, really nice.

I mean, even if you create this kind of use case and share it with Google, probably they will create a dedicated API for that. That's what they did with the translator and language detectors. And it's so much easier to get those smaller models into the browser because there's controversy over the prompt API.

Well, I mean, there's definitely some mobile phones that it's not going to come to, and those are probably all the mobile phones that Alex Russell was talking about earlier. So again, this is a progressive enhancement at best. Like, there is a possibility in the future that we all have incredibly powerful devices, even in the cheaper or middle end of the market. And that's definitely not the case right now. So once again, like, this is experimental. Like, the smaller models, sorry, I should point out, like, the prompt API is the big one. It's the 2GIG model. The smaller focus models will run on less difficult hardware. And so we should expect to see more of those crop up in more places. I believe, like, the language translation packs, like English to Spanish, is about 80 megabytes right now. So it's not 2GIG. It doesn't take all of that. And running them is much easier, too. So we should see it come to most devices and, certainly, like, new ones. But you got to keep the, yeah, progressive enhancement and kind of stuff in mind as well. Yeah, it makes sense.

Fireside API and Real-time Translation

Short description:

Discussing Fireside API for iPhone calls, progressive enhancement with Firebase AI toolkit, challenges of multiple voices detection, and real-time translation feasibility.

Yeah, it makes sense. Actually, Thomas, before in his talk, he was saying that you can use Fireside API. That is an iPhone call in. If you don't have the way to do it locally, you can call the API. Yeah, so I keep talking about it as this progressive enhancement that you build yourself. Yeah, absolutely. Google or the Firebase team put together a frontend dedicated Firebase AI toolkit that uses in-browser AI if possible and will throw off to a server automatically for you if you don't. So, yeah, it's a very cool tool. I'd like to see, I expect there will be more of those kind of things around as well. Because that one, I think, specifically calls off to the Firebase hosting of models and stuff like that. And if you want to use something else. But, yeah. Really cool. Really cool.

And what happens if two, three people speak simultaneously? Will it be able to detect? Oh, I practice this thing in a quiet room on my own. We can try later. I mean, this is probably something, yeah, let's gather around and talk to it all at the same time at some point. I don't think it will do very well. But like for the most part, nothing does very well picking out multiple voices if they're speaking simultaneously. So, I don't think it has different speaker detection and the kind of focused things that other more in-depth speech to text models and products have. But yeah, let's have a go with it. Let's see what it does. Actually, I can relate to that question. Because in Spain, when we talk, we talk at the same time. We can talk about something, but we are talking all at the same time. So, this is really interesting. So, do you see these APIs being utilized for real-time translation of the stream content? Is that feasible at this point? I can't see why not. I like that. Like, if you're getting... Yeah, you should be able to, like, take a stream, translate it into that text, translate it in real-time if you need to.

Privacy Concerns and Feasibility

Short description:

Discussing feasibility of real-time translation, privacy concerns regarding AI training, browser-based model usage, and trust in offline capability for training.

Yeah, I'd love to see that as a use case. Is that feasible at this point? Again, like, the at this point question is always like, oh, yeah, sure, if your users have content, and a flag turned on. But soon enough, soon enough. Nice.

So, about privacy, can we be sure that the content we translate is not used for AI training, and it's safe to share names and other personal details? I hope not... Right. So, what is happening here is the browser is downloading a model to run inside the browser itself. It is not that... You know, I turned the network off there, and it worked, which means it is not calling home. It is not doing stuff like that. That is what the web speech API was doing, at least in Chrome. It was calling home and getting stuff from cloud speech. And whilst useful, privacy is a concern. With all of these models, nothing leaves the device.

And, you know, I guess that as much as you trust Google not to just, if somebody turns on Google, if somebody turns on media devices, pipe that off to themselves, you would have to trust them to do that. You can try it out. You can open the network tab to see if connections are being made. You can see what is happening in the background. I think we can trust that these things are offline capable and will not be used to train stuff. That's nice.

Model Integration and Language Packs

Short description:

Discussing model size, integration into browsers, local model usage, language pack availability, and translation coverage through browser models.

Yeah. Do you foresee those 22 gigabytes of loads making their way into the browser, or will it make sense being bundled into the... So it is an important thing. It only requires there to be 22 gigabytes of space. I think the model itself is only two gigabytes in size. That is one thing. It doesn't have to go and download 22, which is a good thing. Will it be built into the browser in the first place? Maybe, but also maybe not. I think building it into the browser in the first place means you have to detect what kind of device it is being downloaded on, so you don't accidentally take up two gigabytes of somebody's space without being able to actually run it. But once you download it the first time for one application, it's just part of the browser. So it is there for every other application that wants to use that afterwards. It's not your version of the model, it's the browser's version of the model, so it will be there afterwards.

One other thing I want to mention, actually, I was just talking to Thomas before coming up to give this talk, and apparently, even though I demonstrated that the Web Speech API, which calls off to a server right now, you know, doesn't work without a network connection, there is a flag which you can turn on to use a local model, and if you have the local translation model, then it's entirely possible that it will work locally in the future. So this is kind of, it's been added to the spec, and so building these models into browsers to do this kind of stuff is actually going to help us in those old APIs, as well as these new ones. Oh, really cool, really cool.

And yes, the last question, what about languages like Japanese? That's a great question. So there are a bunch of language packs available. I used English to Spanish, mainly because I genuinely am going on holiday to Spain. English to Japanese is actually one of the packs that I was able to download earlier. Now, there are a load available, but I think given whatever version of Canary I had at the time, it was denying me English to French. They exist and they're coming, but it was not letting me do so, but English to Japanese absolutely is there. I just didn't play around with the Japanese stuff because I wouldn't know if I was getting it correct or not. But yes, there's a lot of these language things going around, and I think even better for the translation stuff that if there's not a direct one language to another, I think it's being built such that you could kind of pass it through a middle model where you go English to one language, sorry, one language to English, English to the other language, and kind of get all language coverages without having to necessarily cover everything. Of course, you might find the quality varies at that point, but. No, I have sense. And at the end, if you want these languages, just ask them. Yes. Oh, absolutely. Yeah. Back to the Chrome team. No, nice. What's really nice. Thank you, Phil. We'll continue now.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Embracing WebGPU and WebXR With Three.js

JSNation 2024

27 min

Embracing WebGPU and WebXR With Three.js

Top Content

Mr.doob

Author of Three.js

The 3JS project has evolved into a community-driven effort with numerous contributors over the past 14 years. It started with 3D engine work in Flash and transitioned to using SVGs for rendering in HTML5 before adopting WebGL. The project showcases various projects and frameworks, including a no-code tool powered by 3.js. The team is working on a new render using WebGPU and developing a new shader language called TSL. The hope is that WebGPU will eventually replace WebGL, offering better control and performance.

webgpu three.js

Building a Voice-Enabled AI Assistant With Javascript

JSNation 2023

21 min

Building a Voice-Enabled AI Assistant With Javascript

Top Content

Tejas Kumar

Author of the "Fluent React" bestselling book, software engineer with 23 years of experience, and host of the developer-loved ConTejas Code podcast.

This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.

case study artificial intelligence

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

JSNation US 2024

31 min

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

Top Content

Addy Osmani

Engineering Leader Working on Google Chrome

AI is transforming software engineering by using agents to help with coding. Agents can autonomously complete tasks and make decisions based on data. Collaborative AI and automation are opening new possibilities in code generation. Bolt is a powerful tool for troubleshooting, bug fixing, and authentication. Code generation tools like Copilot and Cursor provide support for selecting models and codebase awareness. Cline is a useful extension for website inspection and testing. Guidelines for coding with agents include defining requirements, choosing the right model, and frequent testing. Clear and concise instructions are crucial in AI-generated code. Experienced engineers are still necessary in understanding architecture and problem-solving. Energy consumption insights and sustainability are discussed in the Talk.

artificial intelligence

AI and Web Development: Hype or Reality

JSNation 2023

24 min

AI and Web Development: Hype or Reality

Top Content

Wes Bos

Full Stack Developer, Speaker & Teacher, Co-host of Syntax.fm podcast.

This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.

productivity artificial intelligence

The Rise of the AI Engineer

React Summit US 2023

30 min

The Rise of the AI Engineer

Top Content

Watch video: The Rise of the AI Engineer

Shawn Swyx Wang

Latent.Space Editor & Smol.ai Founder

The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.

web development artificial intelligence builders and founders future of development

Web Apps of the Future With Web AI

JSNation 2024

32 min

Web Apps of the Future With Web AI

Jason Mayes

Web AI Lead at Google.

Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.

artificial intelligence

Workshops on related topic

AI on Demand: Serverless AI

DevOps.js Conf 2024

163 min

AI on Demand: Serverless AI

Top Content

Featured WorkshopFree

Nathan Disidore

In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.

serverless architecture artificial intelligence

AI for React Developers

React Advanced 2024

142 min

AI for React Developers

Top Content

Featured Workshop

Eve Porcello

Knowledge of AI tooling is critical for future-proofing the careers of React developers, and the Vercel suite of AI tools is an approachable on-ramp. In this course, we’ll take a closer look at the Vercel AI SDK and how this can help React developers build streaming interfaces with JavaScript and Next.js. We’ll also incorporate additional 3rd party APIs to build and deploy a music visualization app.
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)

react next.js artificial intelligence

Building Full Stack Apps With Cursor

JSNation 2025

46 min

Building Full Stack Apps With Cursor

Featured Workshop

Mike Mikula

In this workshop I’ll cover a repeatable process on how to spin up full stack apps in Cursor. Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development. We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more! By the end expect to be able to run your own AI generated full stack app on your machine!
Please, find the FAQ here

artificial intelligence

Vibe coding with Cline

JSNation 2025

64 min

Vibe coding with Cline

Featured Workshop

Nik Pash

The way we write code is fundamentally changing. Instead of getting stuck in nested loops and implementation details, imagine focusing purely on architecture and creative problem-solving while your AI pair programmer handles the execution. In this hands-on workshop, I'll show you how to leverage Cline (an autonomous coding agent that recently hit 1M VS Code downloads) to dramatically accelerate your development workflow through a practice we call "vibe coding" - where humans focus on high-level thinking and AI handles the implementation.You'll discover:The fundamental principles of "vibe coding" and how it differs from traditional developmentHow to architect solutions at a high level and have AI implement them accuratelyLive demo: Building a production-grade caching system in Go that saved us $500/weekTechniques for using AI to understand complex codebases in minutes instead of hoursBest practices for prompting AI agents to get exactly the code you wantCommon pitfalls to avoid when working with AI coding assistantsStrategies for using AI to accelerate learning and reduce dependency on senior engineersHow to effectively combine human creativity with AI implementation capabilitiesWhether you're a junior developer looking to accelerate your learning or a senior engineer wanting to optimize your workflow, you'll leave this workshop with practical experience in AI-assisted development that you can immediately apply to your projects. Through live coding demos and hands-on exercises, you'll learn how to leverage Cline to write better code faster while focusing on what matters - solving real problems.

artificial intelligence

Free webinar: Building Full Stack Apps With Cursor

Productivity Conf for Devs and Tech Leaders

71 min

Free webinar: Building Full Stack Apps With Cursor

Top Content

WorkshopFree

Mike Mikula

In this webinar I’ll cover a repeatable process on how to spin up full stack apps in Cursor. Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development. We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more! By the end expect to be able to run your own ai generated full stack app on your machine!

fullstack artificial intelligence

Working With OpenAI and Prompt Engineering for React Developers

React Advanced 2023

98 min

Working With OpenAI and Prompt Engineering for React Developers

Top Content

Workshop

Richard Moss

In this workshop we'll take a tour of applied AI from the perspective of front end developers, zooming in on the emerging best practices when it comes to working with LLMs to build great products. This workshop is based on learnings from working with the OpenAI API from its debut last November to build out a working MVP which became PowerModeAI (A customer facing ideation and slide creation tool).
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps

artificial intelligence openai react and ai