Building a Voice-Enabled AI Assistant With Javascript

Rate this content
Bookmark

In this talk, we'll build our own Jarvis using Web APIs and langchain. There will be live coding.

This talk has been presented at JSNation 2023, check out the latest edition of this JavaScript Conference.

FAQ

The project uses the Web Speech API for speech recognition, the OpenAI GPT-3.5 Turbo model for processing text, and the Speech Synthesis API for converting text to speech.

Tauri is a tool that allows you to create native desktop applications using web technologies like HTML and JavaScript, with Rust as the backend. It can be used to turn the AI assistant into a native desktop app.

Tejas Kumar is the founder of a developer relations consultancy that helps developer-oriented companies build and maintain strong relationships with developers through strategic discussions, mentorship, hiring, and hands-on execution.

Although the AI assistant uses non-standard APIs requiring prefixes, it could potentially be used in production with custom grammars and further development.

The consultancy operates on the philosophy of 'DevRel, not DevSell,' emphasizing building genuine relationships with developers rather than trying to sell them products.

The consultancy works on projects that involve building tools and technology for fun and learning, such as creating a voice-activated AI assistant using web APIs and JavaScript.

The main goal of the AI assistant project is to have fun while learning about JavaScript and AI, rather than building a product to sell.

You can support Tejas Kumar's DevRel work by following him and engaging with his content.

The project uses web APIs, JavaScript, VEET for the dev server, and Visual Studio Code for coding.

Tejas Kumar's consultancy helps developer-oriented companies build great relationships with developers through high-level strategic discussions, mentorship, hiring, and hands-on execution such as writing documentation and giving talks.

The consultancy prefers to use Chrome because the Speech Recognition API works reliably in Chrome, although it can be made to work in other browsers with different implementations.

Tejas Kumar
Tejas Kumar
21 min
05 Jun, 2023

Comments

Sign in or register to post your comment.
  • GitNation resident
    Hi, your video conference is amazing, thanks a lot for that! Question: how would associate this voice-enable AI assistant with an avatar that is lip synced? Thx again!
Video Summary and Transcription
This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.

1. Introduction to DevRel and AI

Short description:

Hi, I'm Tejas Kumar, and I run a small but effective developer relations consultancy. We help other developer oriented companies have great relationships with developers through strategic discussions, mentorship, and hands-on execution. Today, we're going to build a voice activated AI assistant using web APIs and JavaScript. The purpose is to have fun while learning and celebrating JavaScript and AI.

Hi, I'm Tejas Kumar, and I run a small but effective developer relations consultancy. What that means is we help other developer oriented companies have great relationships with developers. And we do this through high level strategic discussions, and mentorship, and hiring. Or we do it through low level, hands on execution, like we literally sometimes write the docs, do the talks, etc.

In that spirit, it's important for us to kind of, you know, stay in the loop, and be relevant and be relatable to developers to have great DevRel developer relationships. And sometimes to do that, you just have to build stuff. You see, a lot of conferences these days, are a bunch of DevRel people trying to sell you stuff, and we don't like that. It's DevRel, not DevSell.

And in that spirit, we're not going to sell you anything here, we're just going to hack together. The purpose is to have some fun, to learn a bit, and so on. What we're gonna do in our time together is we're going to build a voice activated AI assistant, like Jarvis from Ironman, using only web APIs, just JavaScript. We'll use VEET for a dev server, but that's it, this works. We're gonna be using some non-standard APIs that do require prefixes and stuff, but if you really wanted to, you could use it in production. You could supply your own grammars and so on. The point today, though, is not that, it's to have fun while learning a bit and also vibing a little bit. All in the spirit of celebrating JavaScript and AI.

2. Building the AI Assistant Plan

Short description:

We're going to use the Web Speech API for speech to text and the speech synthesis API for text to speech. We'll give the text to OpenAI's GPT 3.5 Turbo model and then speak the response. It's a straightforward process using browser APIs that have been around for a while.

So with that, let's get into it by drawing a plan in tldraw. We're gonna go to tldraw, and what do we want to do? Well, we want to first have speech to text. This is using the Web Speech API. From there, we want to take this text and give it to OpenAI, the GPT 3.5 Turbo model. From there, we want to speak. So text to speech from OpenAI. This is the plan. We want to do this with browser APIs. We want to reopen the microphone after GPT 4 talks and have it come back here. This is what we want to do. Let's draw some lines. So it's really just speech to text, an AJAX request and text to speech. This is what we want to do. Not necessarily hard. There are some functions here. This is called the speech recognition we're going to use. That's actually a thing introduced in 2013. It's been around for a while. This is the speech synthesis API. So both of these exist in JavaScript in your browser runtime. They're just ready to use. What we're going to do is use them to fulfill this diagram.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

A Framework for Managing Technical Debt
TechLead Conference 2023TechLead Conference 2023
35 min
A Framework for Managing Technical Debt
Top Content
Today's Talk discusses the importance of managing technical debt through refactoring practices, prioritization, and planning. Successful refactoring requires establishing guidelines, maintaining an inventory, and implementing a process. Celebrating success and ensuring resilience are key to building a strong refactoring culture. Visibility, support, and transparent communication are crucial for addressing technical debt effectively. The team's responsibilities, operating style, and availability should be transparent to product managers.
Debugging JS
React Summit 2023React Summit 2023
24 min
Debugging JS
Top Content
Watch video: Debugging JS
Debugging JavaScript is a crucial skill that is often overlooked in the industry. It is important to understand the problem, reproduce the issue, and identify the root cause. Having a variety of debugging tools and techniques, such as console methods and graphical debuggers, is beneficial. Replay is a time-traveling debugger for JavaScript that allows users to record and inspect bugs. It works with Redux, plain React, and even minified code with the help of source maps.
A Practical Guide for Migrating to Server Components
React Advanced 2023React Advanced 2023
28 min
A Practical Guide for Migrating to Server Components
Top Content
Watch video: A Practical Guide for Migrating to Server Components
React query version five is live and we'll be discussing the migration process to server components using Next.js and React Query. The process involves planning, preparing, and setting up server components, migrating pages, adding layouts, and moving components to the server. We'll also explore the benefits of server components such as reducing JavaScript shipping, enabling powerful caching, and leveraging the features of the app router. Additionally, we'll cover topics like handling authentication, rendering in server components, and the impact on server load and costs.
Power Fixing React Performance Woes
React Advanced 2023React Advanced 2023
22 min
Power Fixing React Performance Woes
Top Content
Watch video: Power Fixing React Performance Woes
This Talk discusses various strategies to improve React performance, including lazy loading iframes, analyzing and optimizing bundles, fixing barrel exports and tree shaking, removing dead code, and caching expensive computations. The speaker shares their experience in identifying and addressing performance issues in a real-world application. They also highlight the importance of regularly auditing webpack and bundle analyzers, using tools like Knip to find unused code, and contributing improvements to open source libraries.
Monolith to Micro-Frontends
React Advanced 2022React Advanced 2022
22 min
Monolith to Micro-Frontends
Top Content
Microfrontends are considered as a solution to the problems of exponential growth, code duplication, and unclear ownership in older applications. Transitioning from a monolith to microfrontends involves decoupling the system and exploring options like a modular monolith. Microfrontends enable independent deployments and runtime composition, but there is a discussion about the alternative of keeping an integrated application composed at runtime. Choosing a composition model and a router are crucial decisions in the technical plan. The Strangler pattern and the reverse Strangler pattern are used to gradually replace parts of the monolith with the new application.
AI and Web Development: Hype or Reality
JSNation 2023JSNation 2023
24 min
AI and Web Development: Hype or Reality
Top Content
This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.

Workshops on related topic

AI on Demand: Serverless AI
DevOps.js Conf 2024DevOps.js Conf 2024
163 min
AI on Demand: Serverless AI
Top Content
Featured WorkshopFree
Nathan Disidore
Nathan Disidore
In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.
AI for React Developers
React Advanced 2024React Advanced 2024
142 min
AI for React Developers
Featured Workshop
Eve Porcello
Eve Porcello
Knowledge of AI tooling is critical for future-proofing the careers of React developers, and the Vercel suite of AI tools is an approachable on-ramp. In this course, we’ll take a closer look at the Vercel AI SDK and how this can help React developers build streaming interfaces with JavaScript and Next.js. We’ll also incorporate additional 3rd party APIs to build and deploy a music visualization app.
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)
Build Modern Applications Using GraphQL and Javascript
Node Congress 2024Node Congress 2024
152 min
Build Modern Applications Using GraphQL and Javascript
Featured Workshop
Emanuel Scirlet
Miguel Henriques
2 authors
Come and learn how you can supercharge your modern and secure applications using GraphQL and Javascript. In this workshop we will build a GraphQL API and we will demonstrate the benefits of the query language for APIs and what use cases that are fit for it. Basic Javascript knowledge required.
Leveraging LLMs to Build Intuitive AI Experiences With JavaScript
JSNation 2024JSNation 2024
108 min
Leveraging LLMs to Build Intuitive AI Experiences With JavaScript
Featured Workshop
Roy Derks
Shivay Lamba
2 authors
Today every developer is using LLMs in different forms and shapes, from ChatGPT to code assistants like GitHub CoPilot. Following this, lots of products have introduced embedded AI capabilities, and in this workshop we will make LLMs understandable for web developers. And we'll get into coding your own AI-driven application. No prior experience in working with LLMs or machine learning is needed. Instead, we'll use web technologies such as JavaScript, React which you already know and love while also learning about some new libraries like OpenAI, Transformers.js
Llms Workshop: What They Are and How to Leverage Them
React Summit 2024React Summit 2024
66 min
Llms Workshop: What They Are and How to Leverage Them
Featured Workshop
Nathan Marrs
Haris Rozajac
2 authors
Join Nathan in this hands-on session where you will first learn at a high level what large language models (LLMs) are and how they work. Then dive into an interactive coding exercise where you will implement LLM functionality into a basic example application. During this exercise you will get a feel for key skills for working with LLMs in your own applications such as prompt engineering and exposure to OpenAI's API.
After this session you will have insights around what LLMs are and how they can practically be used to improve your own applications.
Table of contents: - Interactive demo implementing basic LLM powered features in a demo app- Discuss how to decide where to leverage LLMs in a product- Lessons learned around integrating with OpenAI / overview of OpenAI API- Best practices for prompt engineering- Common challenges specific to React (state management :D / good UX practices)
Working With OpenAI and Prompt Engineering for React Developers
React Advanced 2023React Advanced 2023
98 min
Working With OpenAI and Prompt Engineering for React Developers
Top Content
Workshop
Richard Moss
Richard Moss
In this workshop we'll take a tour of applied AI from the perspective of front end developers, zooming in on the emerging best practices when it comes to working with LLMs to build great products. This workshop is based on learnings from working with the OpenAI API from its debut last November to build out a working MVP which became PowerModeAI (A customer facing ideation and slide creation tool).
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps