And here from parameters we see the number that you use, where you want to call, and description for the parts that we just discussed. For LLM we select the model, and specify the prompt, what it should do. For text to speech we select voice, and I use Cartesian specific voice ID. And speech to text, DeepGram service with NOVA2 model. And that's it. Maybe a few more parameters, but simple as it is. We can compare it to Pipecat. Pipecat is a do-it-yourself library, and normally you have to set up the way more here. But let's check the example.
I simplified code a bit, but I wanted to highlight the main concepts here. You need your transport, you need to set up your models like speech to text, text to speech, and LLM with those services. You'll have to register on those services separately, comparing to WAPI. Then you provide your context to LLM, and you create Pipecat pipeline. Pipecat works with pipelines and tasks. And in pipeline we set, okay, that will be inputs, then we put it into speech to text, then we provide all user information, and combine it into LLM. Then we have a response, response we convert back to speech, send it back to transport, and save back to Assistant what it was.
Where we created our pipeline, based on pipeline we can create a task and run this task. That's it, again, what you need to have a call on specific neighbor with your voice agent. It's quite simple. I prepared for you the whole repository with examples where you can check the code of different tools, different models, and you can play yourself. Your link is also provided. One more thing I wanted to share, some findings when we started building Voice Agent with Unreal Production feature. First of all, what I already mentioned, it was, everything was perfect when we started testing it with English. But our audience is Dutch, Dutch people, and here we face some challenges. And that's something you need to consider. What languages do you need to use for your system?
And don't think that if it works for English, it will work fine for all the rest. Because you will have to adjust your speech to text, you have to adjust the text to speech and sometimes you will not have already optimized for this specific language. So maybe there is an idea to go with limited set, make sure that your idea works and then extend those languages, you search for specific models for the language. Also for infrastructure. For infrastructure, well, first thing that I noticed, even despite the thing that we used WAPI service, ready service, it doesn't have the concept of environments.
Comments