Stepan Suvorov, CTO at Roadsoft, discusses the rise of Voice Agents, highlighting market growth, improved models for speech to text and voice generation, reduced latency, and cost-effectiveness. Key components include speech-to-text, LLM analysis, text-to-speech conversion, orchestration, and addressing latency. Delving into essential aspects like voice activity detection, interruption handling, and model selection criteria for Voice Agents. Exploring parameters such as latency, quality, pricing, language support, and voice cloning options. Comparing DIY and managed solutions for Voice Agents, emphasizing flexibility and cost-effectiveness. Discussing cost implications between managed and self-hosted solutions, considering usage volumes and scale. Addressing infrastructure challenges, testing approaches, WAPI limits, model selection, and future plans for voice agents.