English versionEN

[EN] Open Source Voice AI: How We Built ChatGPT's Voice Mode Infrastructure
[ES] Open Source Voice AI: Cómo Construimos la Infraestructura del Modo de Voz de ChatGPT

Open Source Voice AI: How We Built ChatGPT's Voice Mode Infrastructure

Jesse Hall

codeSTACKr

This ad is not shown to multipass and full ticket holders

React Advanced 2026

October 23 - 26, 2026

London, UK & Online

We will be diving deep

Learn More

Bookmark

Sentry

Promoted

Code breaks, fix it faster

Crashes, slowdowns, regressions in prod. Seer by Sentry unifies traces, replays, errors, profiles to find root causes fast.

Get started

Ever wondered what it takes to power millions of voice conversations at ChatGPT's scale?

When OpenAI needed infrastructure for ChatGPT's Advanced Voice Mode, they turned to LiveKit's open source infrastructure. Not a proprietary black box. Not a closed platform. Open source software that anyone can use, modify, and deploy.

In this talk, I'll take you behind the scenes of building production voice AI infrastructure that handles millions of conversations:

Why Open Source for Production AI – The technical and business reasons behind the choice
Architecture Decisions – How we built for scale, reliability, and low latency
Scaling to Millions of Calls – The challenges you don't anticipate until you hit them
Lessons Learned – What we'd do differently knowing what we know now
What's Possible Now – How you can use the same infrastructure for your projects

This isn't a sales pitch, it's a technical deep dive with real production metrics, architectural diagrams, and honest discussions about trade-offs. You'll see the actual stack, understand the scaling challenges, and learn from our mistakes.

Whether you're building your first voice agent or scaling to production, you'll walk away with insights from one of the largest voice AI deployments in the world. Because the infrastructure powering ChatGPT's voice mode is open source, and it's available to everyone.

This talk has been presented at JSNation 2026, check out the latest edition of this JavaScript Conference.

artificial intelligence

Jesse Hall

20 min

15 Jun, 2026

Comments

Video Summary and Transcription

Voice interaction with AI is evolving rapidly, focusing on creating natural conversation experiences with crucial time delays. Models in Voice AI are rapidly evolving, emphasizing the importance of latency budgets. Design decisions aim to minimize latency per stage, impacted significantly by network and backend location. Winning teams treat latency budgets like profit and loss statements, with user completion detection being crucial. Understanding VAD thresholds and turn detection importance to avoid frequent interruptions. Prioritizing interruption handling and efficient function calling for user engagement. Enhancing function call responsiveness and optimizing voice agent workflows for efficient task handling. Automated evaluations with LLM judge and observability prioritization for latency and audio replay. Development in Voice AI follows an iterative approach, starting narrow and expanding with an evaluation framework.

Available in Español: Open Source Voice AI: Cómo Construimos la Infraestructura del Modo de Voz de ChatGPT

1. Voice Interaction Challenges in AI Development

Short description:

Voice interaction with AI evolving rapidly. Challenges in creating natural conversation experiences. Time delay crucial for user engagement and experience with voice AI agents.

Voice, that's what this talk is about. For 40 years, we've talked to computers with keyboards and mice. For the past two years, we've talked to AI by typing into chat windows, text inputs, and CLIs. Well, that's changing fast. The AI agents that we're going to be using in a year, they don't live in chat windows. They live on our phones, in our cars, our headphones, our doorbells, in robots. And we're going to talk to them the way that humans have always talked, out loud. And I'm not talking about transcriptions, but real conversations.

Voice is the most natural interface that we have, and it's becoming the natural interface for AI, too. But there is a problem. There are challenges. Building one of these so that it actually feels like a conversation is much harder than it looks. Try this. Imagine having a phone conversation with someone where every reply takes three seconds. You both start talking at the same time. You apologize. You wait. You try again. And within a minute, you really want to hang up. And that is what most voice AI agents feel like the first time a real user takes them out of demo mode.

And so we naturally reply to each other in about 500 milliseconds. And that's the conversational contract your software is competing with. Around one second or less is the bar where it feels like a real conversation. But most production voice apps today land somewhere between two and five seconds. Getting under one second on real infrastructure with the models that we have today is genuinely hard. The gap between where we are and where it needs to be is the entire engineering problem of voice AI. So this is the gap. 500 milliseconds is what humans do. One second is where it starts to feel like a real conversation. And two to five seconds is where most voice apps that we use today actually land.

2. Closing Gap in Voice AI Development

Short description:

Models evolving rapidly, latency budget importance, LiveKit's journey from prototype to widespread use.

The good news is that this gap is closing fast. Models are getting smaller, faster, cheaper every quarter, every week, every day. The one second bar that's aspirational today will be table stakes a year from now. So hold one question in your head for the rest of this talk. Where is my latency budget going and how do I protect it?

I work at a company called LiveKit. We make open source infrastructure for real-time voice, video, and data. We've watched up close what breaks when a voice agent goes from demo to production. In December of 2022, right after ChatGPT launched, our CEO built a weekend prototype. He wanted to talk to it instead of type. He duct taped together browser speech to text, an HTTP call to ChatGPT running in a headless browser and a browser text to speech to play it back. I think duct taped is really the nicest way to put it. The roundtrip latency for that was about eight seconds. It was terrible, but it worked.

Now a few months later, the team built it on WebRTC and open sourced the entire thing. We wrote a blog post about it, but guess what? The internet didn't care. We were not at the top of hacker news. But a few months after that, a developer at OpenAI quietly created a LiveKit Cloud account, picked up the open source SDK, and started prototyping a voice interface for ChatGPT on top of it. We didn't know what they were doing, but they found us because the code was sitting on GitHub. By September of 2023, ChatGPT voice mode shipped on that stack. Two and a half years later, here are a few companies that have used this same stack. WebRTC, Rock, SAP, about a quarter of 911 emergency dispatch centers in the US, and over 100,000 developers building voice products of their own. Every line of that stack is OpenSource, Apache 2.0 on GitHub. You can npm install it today. This is the entire reason that I get to give this talk. The rest of it is not about us. It's about what you actually need to know to ship a voice reliably in your application today.

3. Optimizing Latency in Voice Agent Process

Short description:

Every voice agent has a similar structure. Design decisions aim to minimize latency per stage. Network and backend location impact latency budget significantly.

Here's the diagram that I want you to carry around in your head. Every voice agent, no matter what framework, no matter what model, has roughly the same shape. The user speaks, their mic captures audio, it gets encoded, it travels across a network, your back end. A speech to text model, STT, transcribes it. The text goes into a large language model. The LLM streams tokens back. A text to speech model, TTS, turns those tokens into audio. The audio travels back across the network, and the user's device decodes it and plays it. Eight stages, and each one costs latency.

Your job, every time you make a design decision, is to push the sum of those costs as close to one second as you can. This is the realistic per stage range for a clean conversational turn today. STT is rarely the bottleneck, but it adds up. LLM time to first token is usually the biggest chunk and the most variable. TTS time to first audio is the one most people forget to measure. Add it all up and even a clean turn already eats most of that second. This is before function calls, before retries, before any of the real world stuff that takes you to two, three, five seconds. This is the floor.

If your user is in Sydney, and your backend is in Virginia, that's 120 millisecond floor before anyone even processes anything. You haven't even run a model yet, and a meaningful chunk of your budget is gone. That's why edge routing is not a nice-to-have. 100 milliseconds saved in the network is exactly as valuable as 100 milliseconds saved in inference. Always measure TTFA end-to-end, instrument every stage, and look at the P50 and P99 separately.

4. Addressing Latency Challenges in Voice AI

Short description:

Latency numbers: TTFT for text, TTFA for audio. Budget against TTFA. Aim to reduce latency in stages. Winning teams treat budget like PNL statement. User completion detection crucial.

Engineering time on the stages where you can actually call some of that time back. Published latency numbers from model providers are usually time to first token, TTFT. That's text. Voice agents need time to first audio, TTFA. These two are not the same. TTFA includes the time for the LLM to produce enough tokens for the TTS to start synthesizing plus the TTS time to start streaming audio back.

If you're measuring TTFT and budgeting against TTFA, you're going to ship something that feels a half a second slower than your dashboard says it actually is. Always measure TTFA end-to-end, instrument every stage, and look at the P50 and P99 separately. Most will not hit one second today. You'll probably land closer to two or three seconds. Your job is to spend the next quarter clawing that back, hundreds of milliseconds at a time in whichever stage is cheapest.

The teams that win at voice AI are the ones treating that budget like a PNL statement. Once you have a pipeline that returns audio fast enough, you hit a harder problem. When has the user actually finished talking? That sounds trivial, but it's not. The naive approach is Voice Activity Detection, or VAD. VAD listens for silence. The moment it sees a silence gap, it sends what it has to the LLM. But humans pause in the middle of sentences all the time.

5. Understanding VAD Thresholds and Turn Detection

Short description:

VAD threshold impact, Semantic Turn Detection importance. Avoid frequent interruptions for user satisfaction. Prioritize true negatives for turn detection.

If your VAD threshold is 200 milliseconds, your agent will cut you off constantly. If it's two seconds, then your agent will feel sluggish. There's no good static answer. Whatever answer you pick is going to be wrong in one direction.

The canonical example of something like this is the user giving their phone number. They pause between the area code and the next three digits. They pause again before the last four digits. VAD sees three separate turns. Your agent responds three times. The user hates that. The same pattern breaks addresses, credit card numbers, order numbers, anything structured.

The fix is Semantic Turn Detection, a small model that listens to the content of what the user is saying and decides based on language, whether it sounds like they finished. This is one of the few places where you should not roll your own. We trained and open sourced ours. It runs on a CPU in under 50 milliseconds. It works with 14 languages. It catches these phone number use cases. It also catches trailing a and and things like that. Turn detection is a tradeoff between true positive rate and true negative rate. You want to bias towards true negative. Users can tolerate a slight delay, but they will not tolerate being cut off.

6. Interruption Handling and Function Calling

Short description:

Importance of interruption handling and function calling for voice agents. Ensure prompt response to interruptions and efficient function calling to maintain user engagement. Key rules for function design to optimize LLM inference calls and latency. Prioritize concise data for LLM output to ensure smooth agent interactions.

Now the mirror image of turn detection is Interruption Handling. When an agent is talking and the user makes a sound, what do you do? If the user is interrupting, we need to stop within 100 milliseconds. Any longer and it feels like you're talking over the agent. If the user just says, oh, while you're explaining something, they're not interrupting. They're back channeling. They're telling you to keep going. If your agent stops every time someone breathes, coughs, sneezes into the microphone, the conversation falls apart.

The pattern here is a small model that classifies whether incoming audio is a real interruption or just acknowledgment. So, we want to stop fast when it is and keep going when it isn't. So far, we've been talking about agents that talk. That's a demo. The thing that turns a demo into a product is function calling. When the agent can look up an order, schedule an appointment, file a ticket, transfer a call, every function call costs you roughly two LLM inference calls. One to decide which tool to call and one to summarize the result back to the user. That two to five second range that we just talked about, a tool calling turn pushes you to the high end of that. You're looking at four or five seconds when the agent has to actually do something. That's where users start wondering if your agent broke.

Function calling isn't just an LLM problem. It's a latency design problem. So, four rules that will pay you back here. The first one, functions should have short, clear names and as few parameters as possible. LLMs pick better when there is less to pick from. Second, the back end calls behind your function should come back in under 500 milliseconds. If your CRM is taking two seconds, the agent is sitting there in silence. The third thing, the data your function returns to be something the LLM can say out loud. JSON works, but it tempts the model to leak structure into the speech. So, return concise, speakable summaries. And fourth, the math is unforgiving. 95% accuracy sounds great. When an agent taking 300 conversations a day, that's 15 failures a day.

7. Enhancing Function Call Responsiveness

Short description:

Test critical function calls for timely responses. Acknowledge delays to prevent user perception of failure. Implement streaming updates for long function call durations to enhance agent responsiveness.

So, test the function calls that matter most. Especially the ones where the failures can cost you money. Now, when a function takes more than about one second, you have to say something. In voice, silence sounds like failure. So, the user is going to think that it's broken.

The pattern here is to acknowledge before you act. So, let me pull that up for you. One second. That buys you another second or two of perceived patience. For anything longer than five seconds, add streaming progress updates. So, this single pattern is the difference between an agent that feels responsive and one that feels broken, even though the actual latency is identical.

Now, the next problem, you write a system prompt that says, first, ask the user for their email. Then ask for their address. Then confirm the order. Then offer them an upsell. That works for a couple days, maybe, but then a user gives you their email and their address in the same sentence. Or a user asks a question in the middle, or someone interrupts and wants to start over. Suddenly, your prompt-shaped state machine is in seven different states all at once, and your agent is hallucinating order numbers. So, LLMs, they're bad at running rigid business logic from pros.

8. Optimizing Voice Agent Workflow

Short description:

Split responsibilities between code and LLM for efficient workflow. Choose a framework to handle tasks like email collection and warm transfers. Ensure voice agents are prepared for immediate user interaction to optimize responsiveness.

They're excellent at handling language. They're bad at being workflow engines. So, split responsibilities. Your code owns the workflow, the state, transitions, validation, what's allowed when. The LLM owns language understanding inside each step. You can ask the user for an email in any phrasing they want, but when the email comes back invalid, the code says ask again, not the prompt. This is one of the few places where the framework you pick actually matters.

Tasks, structured conversations, whatever your tool calls them. These pieces you'd otherwise rebuild, like collecting emails, addresses, capturing DTMF tone, the touch tones on a keypad for a phone tree, doing a warm transfer to a human. If you build those by hand, you'll spend a month to get them right. So pick a framework that ships them and does them reliably. A production voice agents are their own animal. The big one to internalize is that voice AI is not serverless.

In a normal web app, a one-second cold start on a first request, it's fine. The user waits, the page loads, they go on. In a voice agent, the first request is the user saying hello. So, if you cold start your container, you're already 400 milliseconds in the hole before anyone has done anything. So the pattern is always warm, idle pools, pre-spin enough agent processes to absorb your normal load, plus headroom for bursts. Treat capacity like seats in a call center, not like lambdas.

9. Production Realities and Call Handling

Short description:

Understand voice agent cost dynamics and when to transfer calls effectively. Test voice agents like a customer service team, categorizing failures for precise evaluation.

Here's two more production realities. The first one is cost. The LLM is going to be 50 to 70% of your per minute bill. And then cost grows super linearly because the LLM rereads the entire history every turn. So a 30 turn call processes well over 100,000 tokens, even though the user only spoke maybe 2,000 words. Token caching is the only thing that really makes this affordable. So make it a hard requirement on your model provider.

The second thing is know when not to be the one handling the call. Here's some signals. The repeated frustration, out of scope requests, emotional escalation, time limits. So these are signals to hand off the call, a warm transfer where a human picks up the conversation with the context already summarized, and that is a completely different experience from a cold transfer that makes the user have to start over. So build the handoff before you launch, you're going to need that from day one.

Testing voice agents is more like running a QA for a customer service team than testing a software function. The failure modes are very different. Did the agent interrupt the user? Did it sound robotic when it should have sounded warm? Did it acknowledge before a slow lookup? Did it lose context after 15 turns? These are not assertions that you can write in a unit test. So we need to bucket these failures into three layers. First is perception, did the audio and STT get the words right? And then reasoning, did the LLM make the right call? And then integration, did the function call work? And did it come back fast enough? So when a session goes bad, walk through those three layers in order. Don't reach for the prompt first, reach for the transcript.

10. Automated Evaluations and Observability

Short description:

Automated evaluations with LLM judge, calibrate against human sessions. Build data assets, prioritize observability for latency and audio replay. Develop a minimal voice agent for iterative enhancement.

And for automated evaluations, the pattern that has emerged is LLM as judge, that's a good one. You write a conversation template, maybe a customer wants to reschedule an appointment. The eval runs the agent through that template using a simulated user. Another LLM scores the resulting transcript against criteria, like did the agent confirm the new time? Did it handle the cancellation policy? The catch is that the judge, though, it can be wrong. So you'll want to calibrate it against human labeled sessions first, and then trust it. And then run these on every PR.

Build data assets from real production sessions, anonymize them, and then fold the interesting ones back into your regression suite. Now, when something does break in production, observability is your only friend. There's two non-negotiables. First one, latency spans for every stage of the turn. We need to know where the latency came from. Was it the end of utterance, STT, the LLM time to first token, LLM completion, TTS time to first audio, total response. Look at all of them at P50, P90, and P99. A common failure mode is that P50 looks fine, and then P99 is two seconds.

Be able to replay the actual audio of any session, not just the transcript, the actual audio. More than once, the transcript looked fine, but the audio was awful, has cost teams a week of debugging. Whether you use ours or you roll your own, be sure that you can observe what was actually spoken and how it sounded. Now, a minimum viable voice agent is only about ten lines of Typescript. This is a real working agent. You can speak to it, it speaks back, it handles interruptions, it uses WebRTC. It's hooked into the same pipeline that I just spent 15 minutes describing.

11. Voice AI Development Iterative Approach

Short description:

Start with a narrow focus, iterate, and expand. Develop an evaluation framework and prepare for production. Utilize an open-source framework for scalability and efficiency.

It's hooked into the same pipeline that I just spent 15 minutes describing. So don't try to build a do everything assistant on day one. Pick one narrow thing, schedule one type of appointment, answer FAQs for one product line. Skip the smallest useful version, learn from it, then expand. Hold yourself to a timeline. Maybe on the first week, you get the basic loop working. On the second week, you add the functionality for your use case.

The third week, you build your eval framework, and the fourth week, you harden for production. Skipping any of these, and you're going to regret it. Now, everything that I just walked you through is in an open source framework that you can use today. You don't have to ask for permission. You don't have to be at chat GPT scale to benefit from the engineering that got us to chat GPT scale.

Whether you build on our stack or somebody else's, the constraints are the constraints. Latency budget, turn taking, function call costs, workflow in code, language in the model, cold starts that hit the first hello, caching, evals, observability, the models are going to keep getting faster. That one second bar is going to feel slow before long. The teams who win are the ones who design for it now on infrastructure that they actually own. Get this right and your users won't think about your agent at all. They'll just have a conversation. I hope this was helpful. Go build a voice AI agent today.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building a Voice-Enabled AI Assistant With Javascript

JSNation 2023

21 min

Building a Voice-Enabled AI Assistant With Javascript

Top Content

Tejas Kumar

Author of the "Fluent React" bestselling book, software engineer with 23 years of experience, and host of the developer-loved ConTejas Code podcast.

This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.

case study artificial intelligence

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

JSNation US 2024

31 min

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

Top Content

Addy Osmani

Engineering Leader Working on Google Chrome

AI is transforming software engineering by using agents to help with coding. Agents can autonomously complete tasks and make decisions based on data. Collaborative AI and automation are opening new possibilities in code generation. Bolt is a powerful tool for troubleshooting, bug fixing, and authentication. Code generation tools like Copilot and Cursor provide support for selecting models and codebase awareness. Cline is a useful extension for website inspection and testing. Guidelines for coding with agents include defining requirements, choosing the right model, and frequent testing. Clear and concise instructions are crucial in AI-generated code. Experienced engineers are still necessary in understanding architecture and problem-solving. Energy consumption insights and sustainability are discussed in the Talk.

artificial intelligence

The Rise of the AI Engineer

React Summit US 2023

30 min

The Rise of the AI Engineer

Top Content

Shawn Swyx Wang

Latent.Space Editor & Smol.ai Founder

The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.

web development artificial intelligence builders and founders future of development

AI and Web Development: Hype or Reality

JSNation 2023

24 min

AI and Web Development: Hype or Reality

Top Content

Wes Bos

Full Stack Developer, Speaker & Teacher, Co-host of Syntax.fm podcast.

This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.

productivity artificial intelligence

The AI-Native Software Engineer

JSNation US 2025

35 min

The AI-Native Software Engineer

Top Content

Addy Osmani

Engineering Leader Working on Google Chrome

Software engineering is evolving with AI and VIBE coding reshaping work, emphasizing collaboration and embracing AI. The future roadmap includes transitioning from augmented to AI-first and eventually AI-native developer experiences. AI integration in coding practices shapes a collaborative future, with tools evolving for startups and enterprises. AI tools aid in design, coding, and testing, offering varied assistance. Context relevance, spec-driven development, human review, and AI implementation challenges are key focus areas. AI boosts productivity but faces verification challenges, necessitating human oversight. The impact of AI on code reviews, talent development, and problem-solving evolution in coding practices is significant.

artificial intelligence

Web Apps of the Future With Web AI

JSNation 2024

32 min

Web Apps of the Future With Web AI

Jason Mayes

Web AI Lead at Google.

Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.

artificial intelligence

Workshops on related topic

AI on Demand: Serverless AI

DevOps.js Conf 2024

163 min

AI on Demand: Serverless AI

Top Content

Featured WorkshopFree

Nathan Disidore

In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.

serverless architecture artificial intelligence

AI for React Developers

React Advanced 2024

142 min

AI for React Developers

Top Content

Featured Workshop

Eve Porcello

Knowledge of AI tooling is critical for future-proofing the careers of React developers, and the Vercel suite of AI tools is an approachable on-ramp. In this course, we’ll take a closer look at the Vercel AI SDK and how this can help React developers build streaming interfaces with JavaScript and Next.js. We’ll also incorporate additional 3rd party APIs to build and deploy a music visualization app.
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)

react next.js artificial intelligence

Building Full Stack Apps With Cursor

JSNation 2025

46 min

Building Full Stack Apps With Cursor

Featured Workshop

Mike Mikula

In this workshop I’ll cover a repeatable process on how to spin up full stack apps in Cursor. Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development. We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more! By the end expect to be able to run your own AI generated full stack app on your machine!
Please, find the FAQ here

artificial intelligence

Vibe coding with Cline

JSNation 2025

64 min

Vibe coding with Cline

Featured Workshop

Nik Pash

The way we write code is fundamentally changing. Instead of getting stuck in nested loops and implementation details, imagine focusing purely on architecture and creative problem-solving while your AI pair programmer handles the execution. In this hands-on workshop, I'll show you how to leverage Cline (an autonomous coding agent that recently hit 1M VS Code downloads) to dramatically accelerate your development workflow through a practice we call "vibe coding" - where humans focus on high-level thinking and AI handles the implementation.You'll discover:The fundamental principles of "vibe coding" and how it differs from traditional developmentHow to architect solutions at a high level and have AI implement them accuratelyLive demo: Building a production-grade caching system in Go that saved us $500/weekTechniques for using AI to understand complex codebases in minutes instead of hoursBest practices for prompting AI agents to get exactly the code you wantCommon pitfalls to avoid when working with AI coding assistantsStrategies for using AI to accelerate learning and reduce dependency on senior engineersHow to effectively combine human creativity with AI implementation capabilitiesWhether you're a junior developer looking to accelerate your learning or a senior engineer wanting to optimize your workflow, you'll leave this workshop with practical experience in AI-assisted development that you can immediately apply to your projects. Through live coding demos and hands-on exercises, you'll learn how to leverage Cline to write better code faster while focusing on what matters - solving real problems.

artificial intelligence

The React Developer's Guide to AI Engineering

React Summit US 2025

96 min

The React Developer's Guide to AI Engineering

Featured WorkshopFree

Niall Maher

A comprehensive workshop designed specifically for React developers ready to become AI engineers. Learn how your existing React skills—component thinking, state management, effect handling, and performance optimization—directly translate to building sophisticated AI applications. We'll cover the full stack: AI API integration, streaming responses, error handling, state persistence with Supabase, and deployment with Vercel.Skills Translation:- Component lifecycle → AI conversation lifecycle- State management → AI context and memory management- Effect handling → AI response streaming and side effects- Performance optimization → AI caching and request optimization- Testing patterns → AI interaction testing strategiesWhat you'll build: A complete AI-powered project management tool showcasing enterprise-level AI integration patterns.

artificial intelligence

Build LLM agents in TypeScript with Mastra and Vercel AI SDK

React Advanced 2025

145 min

Build LLM agents in TypeScript with Mastra and Vercel AI SDK

Featured WorkshopFree

Eric Burel

LLMs are not just fancy search engines: they lay the ground for building autonomous and intelligent pieces of software, aka agents.
Companies are investing massively in generative AI infrastructures. To get their money's worth, they need developers that can make the best out of an LLM, and that could be you.
Discover the TypeScript stack for LLM-based development in this 3 hours workshop. Connect to your favorite model with the Vercel AI SDK and turn lines of code into AI agents with Mastra.ai.

typescript artificial intelligence