Coffee Chat With Documentation, Are You Ready?

Rate this content
Bookmark

The introduction of ChatGPT, Whisper API, and its orchestration tool such as Langchain, Semantic Kernel brings in a lot of hype over AI and what we can build with AI, such as a document assistant. But are we ready to scale our AI project to meet further requirements and broader scenarios, such as handing multiple processes within a document question and answer flow, or offering industry-specific answers with the existing codebase? How can we, as developers, leverage these tools to offer a better experience in documentation for developers as our users? Join my talk and let's find out.

This talk has been presented at JSNation 2024, check out the latest edition of this JavaScript Conference.

FAQ

The speaker is Maya Chavin, a senior software engineer at Microsoft.

Maya Chavin works with the Microsoft Industrial AI team, which leverages AI technologies to build industry-specific AI-integrated solutions and applications.

The main topic of Maya Chavin's talk is about AI, specifically focusing on document Q&A services using generative AI and large language models (LLMs).

Generative AI is a type of artificial intelligence that can generate text and media from various input data, such as text or images, which are called prompts.

Some examples of large language models (LLMs) are GPT, Gemini, Cloudy, and Llama.

In the context of LLMs, a token is a piece of words. Every word in a sentence is translated into tokens, which are used by the AI for processing. Tokens are important because they represent the cost of interacting with LLMs.

The three core capabilities of LLMs for document Q&A mentioned in the talk are completion (including chat as an extension of completion), retrieval (search), and embedding (creating vector representations of text).

The injection phase in a document Q&A service involves loading and parsing documents, splitting them into structural chunks, creating embeddings for these chunks, and indexing them in a database. This phase provides the grounding for the AI to process user queries accurately.

The querying phase in a document Q&A service involves creating embeddings from the user's input query, searching for matching chunks of text, computing the prompts, and using the AI to summarize and format the answer based on the retrieved chunks and user query.

Some services and tools mentioned for parsing and indexing documents in a document Q&A service include Azure Document Intelligence, text splitter from LangChain, text embedding from OpenAI, Azure AI Search, and Pinecone.

Maya Shavin
Maya Shavin
34 min
13 Jun, 2024

Comments

Sign in or register to post your comment.

Video Summary and Transcription

Maya Chavin, a senior software engineer at Microsoft, discusses generative AI and the core model for LM. The flow of a document Q&A service and the importance of prompts in enhancing it are explored. The injection and querying phases of document Q&A are explained, emphasizing the need for efficient storage, indexing, and computing relevant prompts. The talk also covers the use of embedding models, optimization strategies, and the challenges of testing and validating AI results. Creative uses of LLMs and the impact of AI on job security are mentioned.

1. Introduction to Generative AI and LM

Short description:

Hi, everyone. I'm Maya Chavin, a senior software engineer at Microsoft. Today's talk is about generative AI and the core model for LM. We'll discuss the flow of a document Q&A service and how to enhance it using prompts. LM is a large language model that allows us to process human input and train its own data. It works with tokens. Token is a piece of words that need to be translated for the model to understand. To count tokens, we can use a token counter application.

Hi, everyone. You have your lunch? Are you awake or sleepy? Okay, because I don't have real coffee here, so I hope that you already have your coffee. If not, I'm sorry, but this is going to be the most boring talk in your life. No, I really hope not. But anyway, so before and foremost, my name is Maya Chavin. I'm a senior software engineer at Microsoft. I'm working in a team called Microsoft Industrial AI, which we leverage different AI technologies to build AI integrated solution and applications for industry specific.

Sorry, my voice today is lost during the flight, so I don't know what happened. So if it's hard for you to understand me, I'm really sorry. And if you want to understand me better, please feel free to contact me after the talk, okay? I've been, like the introduction, I've been working with web and JavaScript and TypeScript, but today's talk, it has nothing to do with TypeScript or JavaScript or anything. It's talking about AI. And first and foremost, how many people here working with AI or generative AI? Okay, so we can skip this slide.

Now, anyway, so for people who doesn't know about generative AI or maybe know about the term but never have a chance to experience it. So generative AI is an AI that can generate text and media from a varieties of input data, which is we call it prompts, basically text or anything, like now we can also send it some image for it to analyze and also learn from their system data. And that is our talk, we'll based on it, which we will talk about what are the core model, what are the core model for LM or generative AI to use. And our talk also will focus about how we're going to use the model and to define what the core flow of a very simple service, document Q&A, where you can find it on Google a hundred times when you Google for document Q&A using AI. But in this talk, we will learn a bit more what the flow behind it, what we can, what kind of service we can use for each different component inside the flow in LM, and finally how we can enhance and expand the service using prompts or any technique that we can pay attention to when we develop a new document Q&A as a generic service. Okay.

But first and foremost, LM. How many people here working with LM, any model LM? What LM do you use? GPT? GPT? Text embedded? DALY? Raise your hand. Come on, I believe that you already have coffee, right? Anyway, so just a recap, LM as a service is a large language model which allow us to, which be able to process human input. And then it will also have capable of training its own data, whether it's supervised or unsupervised, and it works with token. And the nice thing about LM is that it provide you a set of API at the black box that help developer develop AI applications more straightforward and more simply than before. Okay. So some of the LM we can see here, OpenAI, Google, Microsoft, Meta, Anthropic, Hugging Face, nothing new here.

So we talk about LM working with token, right? So what exactly is token? Well, to put it simple, token is just a piece of words, which mean every single words in a sentence you have to translate it to token. And to count the token, we have some calculator that we can use to count the token. It's called token counter, which is right here. I have it in the, this is applications, that you can go here and write your text in here and it will generate for you how much, how many token it will take you to, it will cost you to pass this string to the AI.

2. Core Capabilities for Document Q&A

Short description:

In this part, we'll discuss the core capabilities for document Q&A, including completion, chat, and retrieval. Completion API allows AI to complete user tasks, while chat is an extension of completion. Retrieval enables search, generating vector representations of text. Document Q&A is not complex, but it's crucial to implement correctly to avoid issues like the AI chatbot used by Air Canada. Document Q&A as a service is a simple text input and button where users ask questions and receive AI-generated answers.

I have it in the, this is applications, that you can go here and write your text in here and it will generate for you how much, how many token it will take you to, it will cost you to pass this string to the AI. Okay. This is just a token and you can also see the approximately calculation of token based on OpenAI website. And it's very important because token is money. Literally. We don't work with money, with AI, we work with token.

So when we talk about LM core capability, we have several capability until now, six different one and it's improving. In this talk, we will only focus on three core capability for document Q&A. Completion and chat. Completion and chat, chat is actually completion, extension of completion, so usually when you start an API of completion, you will see the API for chat will have the slash chat at an extension, it's nothing, it's not a separate model, it's using the same completion.

So what is the completion API? Completion API is the API that allow the AI to perform, to complete the task given by user and chatting is also a task given by user. Some of the famous completion API is GPT, Gemini, Cloudy and Lama, it's very hard to pronounce this kind of word. Anyway. So some of these famous completions that we always use when we do chat or text completion and so on, the other one is retrieval. What is retrieval? Retrieval is mean search. You basically, this is a model to allow you to take, to give, to generate some embedding in vector representation of a certain text.

And one of the most popular model of this, API of this is text embedding. Text embedding AIDA, if you ever heard about that for OpenAI, we use it a lot to create, to help us to create a vector representation of a document so that the search algorithm can base on that to find the matching chunks. So this is the three model that we're going to use a lot in document Q&A. Okay.

But before we move to document Q&A, like I said before, document Q&A is not something out of the box. It's not something that really complex but it's something that easily go wrong. For example, Air Canada, well, they got the AI go wrong and they have to pay money for that. Now, there's a argument that the AI chatbot here is actually not AI chatbot. Like, they were written with some dumb algorithm behind and they don't really use chat GPT or any GPT behind it. But again, that's a different story. All I know is that the chatbot go wrong and now the airlines have to pay for that because give misleading information. And that's just one part of the problem that document Q&A is facing if you don't pay attention to what you implement or you don't understand what you implement. So let's take a look at what is document Q&A as a service. So to put it simply, it's just a text input and a button where the user would type inside there a question and send the questions to the AI and ask for an answer.

QnA

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building a Voice-Enabled AI Assistant With Javascript
JSNation 2023JSNation 2023
21 min
Building a Voice-Enabled AI Assistant With Javascript
Top Content
This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.
AI and Web Development: Hype or Reality
JSNation 2023JSNation 2023
24 min
AI and Web Development: Hype or Reality
Top Content
This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.
The Rise of the AI Engineer
React Summit US 2023React Summit US 2023
30 min
The Rise of the AI Engineer
Watch video: The Rise of the AI Engineer
The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.
TensorFlow.js 101: ML in the Browser and Beyond
ML conf EU 2020ML conf EU 2020
41 min
TensorFlow.js 101: ML in the Browser and Beyond
TensorFlow.js enables machine learning in the browser and beyond, with features like face mesh, body segmentation, and pose estimation. It offers JavaScript prototyping and transfer learning capabilities, as well as the ability to recognize custom objects using the Image Project feature. TensorFlow.js can be used with Cloud AutoML for training custom vision models and provides performance benefits in both JavaScript and Python development. It offers interactivity, reach, scale, and performance, and encourages community engagement and collaboration between the JavaScript and machine learning communities.
Web Apps of the Future With Web AI
JSNation 2024JSNation 2024
32 min
Web Apps of the Future With Web AI
Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.
Building the AI for Athena Crisis
JS GameDev Summit 2023JS GameDev Summit 2023
37 min
Building the AI for Athena Crisis
Join Christoph from Nakazawa Tech in building the AI for Athena Crisis, a game where the AI performs actions just like a player. Learn about the importance of abstractions, primitives, and search algorithms in building an AI for a video game. Explore the architecture of Athena Crisis, which uses immutable persistent data structures and optimistic updates. Discover how to implement AI behaviors and create a class for the AI. Find out how to analyze units, assign weights, and prioritize actions based on the game state. Consider the next steps in building the AI and explore the possibility of building an AI for a real-time strategy game.

Workshops on related topic

AI on Demand: Serverless AI
DevOps.js Conf 2024DevOps.js Conf 2024
163 min
AI on Demand: Serverless AI
Top Content
Featured WorkshopFree
Nathan Disidore
Nathan Disidore
In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.
Leveraging LLMs to Build Intuitive AI Experiences With JavaScript
JSNation 2024JSNation 2024
108 min
Leveraging LLMs to Build Intuitive AI Experiences With JavaScript
Featured Workshop
Roy Derks
Shivay Lamba
2 authors
Today every developer is using LLMs in different forms and shapes, from ChatGPT to code assistants like GitHub CoPilot. Following this, lots of products have introduced embedded AI capabilities, and in this workshop we will make LLMs understandable for web developers. And we'll get into coding your own AI-driven application. No prior experience in working with LLMs or machine learning is needed. Instead, we'll use web technologies such as JavaScript, React which you already know and love while also learning about some new libraries like OpenAI, Transformers.js
Llms Workshop: What They Are and How to Leverage Them
React Summit 2024React Summit 2024
66 min
Llms Workshop: What They Are and How to Leverage Them
Featured Workshop
Nathan Marrs
Haris Rozajac
2 authors
Join Nathan in this hands-on session where you will first learn at a high level what large language models (LLMs) are and how they work. Then dive into an interactive coding exercise where you will implement LLM functionality into a basic example application. During this exercise you will get a feel for key skills for working with LLMs in your own applications such as prompt engineering and exposure to OpenAI's API.
After this session you will have insights around what LLMs are and how they can practically be used to improve your own applications.
Table of contents: - Interactive demo implementing basic LLM powered features in a demo app- Discuss how to decide where to leverage LLMs in a product- Lessons learned around integrating with OpenAI / overview of OpenAI API- Best practices for prompt engineering- Common challenges specific to React (state management :D / good UX practices)
Working With OpenAI and Prompt Engineering for React Developers
React Advanced Conference 2023React Advanced Conference 2023
98 min
Working With OpenAI and Prompt Engineering for React Developers
Top Content
Workshop
Richard Moss
Richard Moss
In this workshop we'll take a tour of applied AI from the perspective of front end developers, zooming in on the emerging best practices when it comes to working with LLMs to build great products. This workshop is based on learnings from working with the OpenAI API from its debut last November to build out a working MVP which became PowerModeAI (A customer facing ideation and slide creation tool).
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps
Building AI Applications for the Web
React Day Berlin 2023React Day Berlin 2023
98 min
Building AI Applications for the Web
Workshop
Roy Derks
Roy Derks
Today every developer is using LLMs in different forms and shapes. Lots of products have introduced embedded AI capabilities, and in this workshop you’ll learn how to build your own AI application. No experience in building LLMs or machine learning is needed. Instead, we’ll use web technologies such as JavaScript, React and GraphQL which you already know and love.
Building Your Generative AI Application
React Summit 2024React Summit 2024
82 min
Building Your Generative AI Application
WorkshopFree
Dieter Flick
Dieter Flick
Generative AI is exciting tech enthusiasts and businesses with its vast potential. In this session, we will introduce Retrieval Augmented Generation (RAG), a framework that provides context to Large Language Models (LLMs) without retraining them. We will guide you step-by-step in building your own RAG app, culminating in a fully functional chatbot.
Key Concepts: Generative AI, Retrieval Augmented Generation
Technologies: OpenAI, LangChain, AstraDB Vector Store, Streamlit, Langflow