English versionEN

No Dependencies, No Problem: Streaming AI Over the Phone

What if you could build a phone agent that listens, thinks, and speaks – without touching a single speech-to-text API or wrangling cloud infrastructure? What if all it took was a WebSocket and some JavaScript you already know?
In this talk, you’ll see how to wire up a minimal AI voice loop using modern tools like Bun, with no dependencies and no boilerplate in the way. It’s a quick, practical demo that puts the focus back on business logic – and shows just how little it takes to get started.

This talk has been presented at JSNation 2025, check out the latest edition of this JavaScript Conference.

FAQ

Marius works on the developer relations team at Twilio, where he interacts with developers using Twilio's APIs.

Developers often face latency issues where there is a delay between spoken input and the AI's response, disrupting the user experience.

Twilio offers automatic speech recognition and text-to-speech services using best-in-class providers, allowing developers to offload some responsibilities to Twilio's infrastructure.

A WebSocket server receives text data, which can then be processed by local language models with low latency, and streams text back as responses.

In the example, BUN is used to host a simple web server.

Initially, the AI assistant was hard-coded to respond with 'that's a great question,' but later used GPT-4 to provide dynamic, context-aware answers.

The response time increases slightly when the AI performs a web search, but this can be mitigated with latency-reducing strategies.

The example queries included 'What's the capital of France?' and 'Who won the UEFA Nations League last weekend?'

By using Twilio's infrastructure for automatic speech recognition and text-to-speech, and hosting a WebSocket server to process text with low latency models, developers can minimize latency.

artificial intelligence

Marius Obert

6 min

12 Jun, 2025

Comments

Video Summary and Transcription

Marius from Twilio demonstrates building AI agents for phone calls, addressing latency issues by leveraging Twilio's infrastructure and third-party providers like 11 Labs and Google Cloud. Configuration includes WebSocket integration for message handling, static responses, and text-to-speech with 11 Labs. AI integration involves GPT4 or mini model for conversation history storage. A live demo showcases an AI voice assistant with instant responses and latency improvements.

Available in Español: Sin Dependencias, No Hay Problema: Transmitiendo IA por Teléfono

1. Building AI Agents for Phone Calls

Short description:

Marius from Twilio discusses building AI agents for phone calls, addressing latency issues by leveraging Twilio's infrastructure and third-party providers like 11 Labs and Google Cloud. Host a WebSockets server, process text with LLMs, and achieve low-latency communication. Demonstrates building an agent in three minutes using BUN for web server hosting and WebSocket integration.

Hi, everyone. I'm Marius. I work on the developer relations team at Twilio, and that means I speak to a lot of developers who use our APIs, such as the text messaging API or the voice API. And one thing that a lot of developers recently wanted to build is an AI agent that can make a phone call or receive a phone call. Let me show you how this story often goes from a developer perspective. You have all the great models that you want to combine, one for automatic speech recognition, one for interrupt detection, and the text-to-speech model. You combine them all together, and, in theory, it works nice, but then they quickly realize latency.

The latency is just you say something, you wait, nothing happens, and then you say something again, and then the model starts to talk, and that kills the entire experience. So they need to find a way to work around this. Something that we provide at Twilio is that you can basically left shift a lot of responsibility on our infrastructure, such as automatic speech recognition, text-to-speech, and we work with best-in-class providers, such as 11 Labs or Google Cloud, to provide these services, and you just need to focus on the configuration.

And what you actually need to do in the end is to host a WebSockets server that receives text, and then you can process it with your own LLMs. You can post it to an LLM that runs close to your machine with low latency, and you just stream text back. And, actually, you can build an agent in three minutes. Let's do that together. So I use BUN to host a simple web server. You can just, for the fun of using a new stack every now and then, I expose it on port 5050, and then I have this fetch function that the only thing it does, it upgrades HTTP to WebSocket, and then I attach a data object, so I can recognize the same stream again.

2. WebSocket Configuration and AI Integration

Short description:

Configuring WebSocket callbacks for message handling, logging prompts, and implementing static responses. Integration with 11 Labs for text-to-speech. Involving AI with GPT4 or mini model for conversation history storage and retrieval.

It would make sense to use the phone number here, but I would just use a timestamp, because I would be the only one calling it anyway. And then in the WebSocket configuration, I just have a callback when the socket is open. When it's closed, let's format it a bit. And then the interesting part happens here, when a message comes in. I pass the JSON payload, and then when the message is of type prompt, which it will mostly be. Anyway, I log this to the console, and for now, let's have a hard-coded answer to say that's a great question. And then I log it to the console, and I just stream it back. And what I also do is I log the other types so you see them, but we don't have to worry about them for now.

Okay, let's give it a run. I run my server. I expose this port to the internet, and if I go over here to my configuration, you see whenever a call comes into a phone number, it connects to my WebSocket server. This is the I use 11 labs for text-to-speech. This particular idea of the voice, and this is the starting sentence. Let's call it, and let's hope it works. Is the audio? Let me check. That's a great question. Hey, what's the capital of France? That's a great question. You see, I always get the same response back. Why? Because I don't do anything here. I just say return a static response, but you saw how fast the latency was. So, if I look at the logs, you see the text-to-speech, and speech-to-text happened instantly. Now, let's actually involve some AI here, and if I drill down into that, that's auto-imported.

You see, what do I do here? Well, I have an if statement to see do I already know this conversation ID. If I don't know it, I just use the GPT4 or mini model. I add a system prompt, and the prompt of the user. I add a web search tool, and I make sure to store the history on the server side at OpenAI, and then if the response comes back, I save it. So, when I ask another question the next time, the if statement is triggered here, and then I can refer to the previous conversation, so I don't have to carry on that messages array all the time. I just have to add the most recent prompt. Let's try that again. I restart the server.

3. AI Voice Assistant Demo and Latency Solutions

Short description:

AI Voice Assistant demo with instant responses and latency improvements. Marius presents an AI voice assistant demo, showcasing quick response times and latency solutions.

Call it again. Hello, you are connected to the AI voice assistant powered by Twilio Conversation Relay. What's the capital of France? The capital of France is Paris. And the one of Germany? The capital of Germany is Berlin. Berlin. Who won the UEFA Nations League last weekend? Portugal won the UEFA Nations League final against Spain on June 8, 2025 in Munich. The match ended 2-2 after extra time, and Portugal secured a 5-3 victory in the penalty shootout. So you got the idea, right?

And here I was logging the response time from OpenAI to the console. You could tell when it had to do a web search. It took a little bit longer. Things you could do, you could say, oh, you could add a message like, oh, let me think about it. And then you triggered a request, so there are ways to work around it to make it appear quicker. But you can already tell basically the latency from speech to text and text to speech was basically zero, I would say.

And then you only had the one from the LLM, which is probably also not the best on this network on my local MacBook. But that's my demo and whoop, just like that, the latency problems were gone. Thank you. I'm Marius. If you have questions about this or other AI use cases with voice, our booth is up there. You can find me in the next break. Thank you.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building a Voice-Enabled AI Assistant With Javascript

JSNation 2023

21 min

Building a Voice-Enabled AI Assistant With Javascript

Top Content

Tejas Kumar

Author of the "Fluent React" bestselling book, software engineer with 23 years of experience, and host of the developer-loved ConTejas Code podcast.

This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.

case study artificial intelligence

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

JSNation US 2024

31 min

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

Top Content

Addy Osmani

Engineering Leader Working on Google Chrome

AI is transforming software engineering by using agents to help with coding. Agents can autonomously complete tasks and make decisions based on data. Collaborative AI and automation are opening new possibilities in code generation. Bolt is a powerful tool for troubleshooting, bug fixing, and authentication. Code generation tools like Copilot and Cursor provide support for selecting models and codebase awareness. Cline is a useful extension for website inspection and testing. Guidelines for coding with agents include defining requirements, choosing the right model, and frequent testing. Clear and concise instructions are crucial in AI-generated code. Experienced engineers are still necessary in understanding architecture and problem-solving. Energy consumption insights and sustainability are discussed in the Talk.

artificial intelligence

AI and Web Development: Hype or Reality

JSNation 2023

24 min

AI and Web Development: Hype or Reality

Top Content

Wes Bos

Full Stack Developer, Speaker & Teacher, Co-host of Syntax.fm podcast.

This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.

productivity artificial intelligence

The Rise of the AI Engineer

React Summit US 2023

30 min

The Rise of the AI Engineer

Top Content

Watch video: The Rise of the AI Engineer

Shawn Swyx Wang

Latent.Space Editor & Smol.ai Founder

The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.

web development artificial intelligence builders and founders future of development

Web Apps of the Future With Web AI

JSNation 2024

32 min

Web Apps of the Future With Web AI

Jason Mayes

Web AI Lead at Google.

Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.

artificial intelligence

Code coverage with AI

TestJS Summit 2023

8 min

Code coverage with AI

Premium

Jaap Brasser

Codium

Codium is a generative AI assistant for software development that offers code explanation, test generation, and collaboration features. It can generate tests for a GraphQL API in VS Code, improve code coverage, and even document tests. Codium allows analyzing specific code lines, generating tests based on existing ones, and answering code-related questions. It can also provide suggestions for code improvement, help with code refactoring, and assist with writing commit messages.

artificial intelligence

Workshops on related topic

AI on Demand: Serverless AI

DevOps.js Conf 2024

163 min

AI on Demand: Serverless AI

Top Content

Featured WorkshopFree

Nathan Disidore

In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.

serverless architecture artificial intelligence

AI for React Developers

React Advanced 2024

142 min

AI for React Developers

Top Content

Featured Workshop

Eve Porcello

Knowledge of AI tooling is critical for future-proofing the careers of React developers, and the Vercel suite of AI tools is an approachable on-ramp. In this course, we’ll take a closer look at the Vercel AI SDK and how this can help React developers build streaming interfaces with JavaScript and Next.js. We’ll also incorporate additional 3rd party APIs to build and deploy a music visualization app.
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)

react next.js artificial intelligence

Vibe coding with Cline

JSNation 2025

64 min

Vibe coding with Cline

Featured Workshop

Nik Pash

The way we write code is fundamentally changing. Instead of getting stuck in nested loops and implementation details, imagine focusing purely on architecture and creative problem-solving while your AI pair programmer handles the execution. In this hands-on workshop, I'll show you how to leverage Cline (an autonomous coding agent that recently hit 1M VS Code downloads) to dramatically accelerate your development workflow through a practice we call "vibe coding" - where humans focus on high-level thinking and AI handles the implementation.You'll discover:The fundamental principles of "vibe coding" and how it differs from traditional developmentHow to architect solutions at a high level and have AI implement them accuratelyLive demo: Building a production-grade caching system in Go that saved us $500/weekTechniques for using AI to understand complex codebases in minutes instead of hoursBest practices for prompting AI agents to get exactly the code you wantCommon pitfalls to avoid when working with AI coding assistantsStrategies for using AI to accelerate learning and reduce dependency on senior engineersHow to effectively combine human creativity with AI implementation capabilitiesWhether you're a junior developer looking to accelerate your learning or a senior engineer wanting to optimize your workflow, you'll leave this workshop with practical experience in AI-assisted development that you can immediately apply to your projects. Through live coding demos and hands-on exercises, you'll learn how to leverage Cline to write better code faster while focusing on what matters - solving real problems.

artificial intelligence

Building Full Stack Apps With Cursor

JSNation 2025

46 min

Building Full Stack Apps With Cursor

Featured Workshop

Mike Mikula

In this workshop I’ll cover a repeatable process on how to spin up full stack apps in Cursor. Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development. We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more! By the end expect to be able to run your own AI generated full stack app on your machine!
Please, find the FAQ here

artificial intelligence

Free webinar: Building Full Stack Apps With Cursor

Productivity Conf for Devs and Tech Leaders

71 min

Free webinar: Building Full Stack Apps With Cursor

Top Content

WorkshopFree

Mike Mikula

In this webinar I’ll cover a repeatable process on how to spin up full stack apps in Cursor. Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development. We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more! By the end expect to be able to run your own ai generated full stack app on your machine!

fullstack artificial intelligence

Working With OpenAI and Prompt Engineering for React Developers

React Advanced 2023

98 min

Working With OpenAI and Prompt Engineering for React Developers

Top Content

Workshop

Richard Moss

In this workshop we'll take a tour of applied AI from the perspective of front end developers, zooming in on the emerging best practices when it comes to working with LLMs to build great products. This workshop is based on learnings from working with the OpenAI API from its debut last November to build out a working MVP which became PowerModeAI (A customer facing ideation and slide creation tool).
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps

artificial intelligence openai react and ai