Hello, my name is Mikhail Burtsev, and I'm founder and leader of the Pavlov project at Moscow Institute of Physics and Technology. And today, I will tell you about the Pavlov Agent, which is open source framework for multi-skill conversational AI.
So let's start with a question, why multi-skill is so important? It is important because customer experience spans multiple domains, like surveys, promotions, campaigns, customer service, technical supports and many, many others. And usually to address every domain, every single domain, you need specific skill. So this is why we need to build multi-skilled digital assistant, and we need to have multiple conversational skills in our system.
And this is, if you take a look at the e-commerce assistants, like a modern complex dialogue systems. For example, here is a case of Alimia Assist, which is an assistant at AliExpress. So you can see here that it's hybrid system with many different skills. For example, we have these assistant service with some slot filling engine, and we have customer service with a knowledge graph engine, and we have chatting service with a chat engine. So you see that it's combination of some business rules, of scripted scenarios, and with specific skills addressing different customer needs.
So what is traditional way to build conversational systems right now? The most dominant approach is so-called modular dialog system. So how it works? We have user, user have some prompt to the system, and this prompt is converted to the textual form and feed in natural language understanding module, which performs basically three functions. Domain detection, intent detection, and entities detection in the input of the user. And then after these preprocessing, we have some formal description of the user input, which is also called semantic frame, where we have intent, here it's request movie, and we have entities. In this request, it's general comedy and date weekend. And then all this information goes to the Dialog Manager. And the task of the Dialog Manager is first to update current dialogue state, to make it up to date, to integrate this new information in the previous history of the dialogue, and then with these updated dialogue state, perform the action you need on the side of the system.
So it consists from the dialogue state and from the policy, or script, which decides what action should be selected, given the current dialogue state. And here in our example we have action which is request location. But this action is in some, like internal system representation. And we need to convert this action into the natural language prompt. And here we have the last module of our system, which is natural language generation. Which creates surface form of our request to the users. So we, with action request location, we have output in natural language where are you. So this is basically how current systems are built. And mainly in this interview part, we have a lot of neural nets, deep learning models, which is used here, and in the dialogue manager part, we have some neural networks and a lot of rules and scripted dialogues. And also for natural language generation, we mostly have either retrieval models with some slot filling or templates.
Okay, so then what is AI assistant lifecycle? How we are building our digital assistant, our dialogue systems, with this modular technology. So usually we start with some MVP, a Minimal Viable Product. For NLU, we have some features and some models pre-trained for this domain, and on the side of Dialog Manager, we have a few scripts and it's very nice and clear architecture, and we understand how it works.
Comments