And that gave an opportunity for operations agents to compare what they were saying with the chatbot and give us any feedback. And of course, we need to have a high level of confidence with that stage gate before we go to customer contact. This is where the problem started to emerge. If you ask the chatbot a question that's not in its knowledge base, most of the time it will escalate to a human, but a few times it will just make something up, which I'm sure comes as no surprise to anyone, his work with LLMs. It was pretty good, but 90 to 95% accuracy was not good enough for something that's going to talk to customers thousands of times a week. And here was where the problem started to emerge, right? Complexity. We had a lot of this papering over the cracks of a small LLM problem. It's lying to you, okay, how do we fix that? We put in some validation after it makes its answer to just check, is this factually correct? And maybe that will work pretty well, but ultimately we have to build the validation and test it before we know. So not only have you got complexity of kind of workaround upon workaround, but it's very hard to estimate because we just don't really know how well it's going to work.
To give a counter example, we have built other AI projects within Capital OnTap. We have one called Blaze, which is essentially AI transcription and summarization of customer calls. It's a bit more complex than that, but it's pretty straightforward. And that was built by the operations engineering team. So it started off in the correct team. It started off with a very product focus, and we started off with tracking operational metrics. So after a phone call, operations agent does the wrap up, and we've reduced that time from three minutes to two, which is a massive impact for 600 calls a day. The decision came, the decision time came for Merlin, our chatbot. We had to decide what to do with it. We'd spent at least 200K in people's salaries, as those involved, and LLM costs. And we had to say to the business, look, this thing that you thought was weeks away is actually going to be paused and months away now. But the CEO understands sunk cost fallacy, so we'd killed it and we decided to look at third-party vendors.
Ultimately, we decided this was not our core competency for such a complex project, and we should stick to the fintech side of things and simpler AI projects that we can own within product teams, but go with third parties for something as complicated as this. So we had two vendors that we liked. We essentially just found other businesses that had integrated with them and went to chat with them just to see how they performed. One was much faster, and one of them lied to us about balance amount, which was very interesting to see in the live real world. And as we were killing the main AI project, we decided to reset how we do AI at CapitalTap, so we no longer have a separate AI team. We have AI engineers who will embed in an existing product team, help them deliver their project, and then step away.
Comments