Generative AI is exciting tech enthusiasts and businesses with its vast potential. In this session, we will introduce Retrieval Augmented Generation (RAG), a framework that provides context to Large Language Models (LLMs) without retraining them. We will guide you step-by-step in building your own RAG app, culminating in a fully functional chatbot.
This workshop has been presented at React Summit 2024, check out the latest edition of this React Conference.
FAQ
The workshop is hosted by Dieter, a solution engineer at DataStax.
The workshop introduces technologies such as generative AI, retrieval-augmented generation (RAG), vector stores, and the Langflow no-code environment.
AstraDB is used as a vector store to implement retrieval-augmented generation (RAG) capabilities for the chatbot.
Retrieval-augmented generation (RAG) is a technique that uses a vector store to retrieve context from a large set of documents to provide more accurate and context-aware responses from a large language model (LLM).
Vector search works by vectorizing text, audio, or video content into multi-dimensional vectors that capture the semantics of the content. These vectors are then stored and searched to find the most similar context to a query vector.
Langflow is an open-source, no-code environment that allows users to implement generative AI applications without writing any code.
Rackstack is a curated list of dependencies provided by DataStax that packages various generative AI frameworks, ensuring compatibility and high-quality implementation.
Yes, the chatbot can be deployed on the Streamlit platform, making it accessible to others and allowing for easy sharing and collaboration.
The prerequisites for the workshop include having a GitHub account, and signing up for services like AstraDB, OpenAI, and Streamlit.
The purpose of this online workshop is to build a chatbot using generative AI in a hands-on and interactive manner.
Welcome to this online workshop where we will build our own chatbot with generative AI. Datastacks provides technologies for implementing generative AI applications, including AstraDB as a vector store for retrieval augmented generation. The build-your-own-Rack chatbot repository contains application iterations, each adding additional functionalities to create a fully functional chatbot with streaming capabilities. Implementing the first application with Streamlit involves drawing a UI and integrating it with OpenAI chat models. Deploy the workshop chatbot application in Streamlit and use Langflow to implement generative applications without coding.
Welcome to this online workshop where we will build our own chatbot with generative AI. The agenda includes an introduction to generative AI, retrieval augmented generation, and a hands-on workshop to implement our own chatbot application. There is also an overview of Langflow, a no-code environment for generative AI applications.
Hey, welcome to this online workshop. So we are going to build today our own chatbot with generative AI. So it is a hands-on workshop that we are going to do together. And feel free to open up your camera if you want. And so let's try to make this as interactive as possible.
I'm Dieter. I'm a solution engineer. I work for DATAstacks. And so I work with the technologies we have at DATAstacks. And I'm going to introduce a bit of them in a second.
The agenda for today looks like this. So first, a few slides to introduce you to the big picture of generative AI. Then I would like to introduce retrieval augmented generation and what it is, what you can do with it, what is it good for. And the main part of the whole thing, for sure, it is only a few slides, is a hands-on workshop. So we are going to implement our own chatbot application. And hopefully at the end of this workshop, every one of you is proud of having implemented our own functional chatbot that you can show around and experiment with. As you can see, there is an additional agenda point. Today it is about coding. It is about coding our chatbot application. But there are other ways to implement generative AI applications. And I would like to introduce to you today also Langflow, an open source project, a no-code environment that allows you to implement generative AI applications without a single line of code. But that is for the end. And first, a few slides.
2. Datastacks and Retrieval Augmented Generation
Short description:
Datastacks provides technologies for implementing generative AI applications, including AstraDB as a vector store for retrieval augmented generation. Generative AI leverages large language models, but they may not have our private data, so we can use fine-tuning or retrieval augmented generation with a vector store. Vector search allows us to find context similar to a query vector to answer questions.
And then we do some work together, some coding. So a few words to Datastacks. So Datastacks is a real-time AI company. So we provide technologies that allow you and allow developers to implement their generative AI applications. Data equals AI, right?
And so at the core of Datastacks, we have data management technologies, like our database in the cloud called AstraDB. And we are going to leverage AstraDB today as our vector store. So what a vector store is, I will explain all these. And if anything is unclear, please let me know.
So AstraDB, we use it as a vector store in order to implement our rack capabilities, that is retrieval, augmented generation, and this is what you want to use as soon as you implement the chatbot that works with your private data. And we also discuss some libraries and frameworks within our hands-on work. So let's go to the next slide. So let's set the foundation.
So this is about artificial intelligence, so a subdomain of artificial intelligence is generative AI. And generative AI is used and got super famous over the last months because we all use it each day in order to generate content, like text, like audio, and video. And we leverage within the subdomain of artificial intelligence within generative AI large language models. There is not to say much about large language models. I believe every one of us already touched on it and worked with a large language model. But there is one point I would like to stress about a bit.
So the large language model was trained with vast amounts of data, data that is publicly available. But it was not trained with our private data for sure, right? And so this is why LLM would hallucinate if we use it directly without providing some additional context about the context we are in, about the products, about the services, right? And so that is why we implement today this RAC chatbot, so asset, the LLM, it can be outdated, was trained months back. And it doesn't have our private data. And it will hallucinate if we ask something and it doesn't really have the data in order to generate response on point. And it might be insecure if we use it as a service. So there is no AI without data. And so there are ways to provide the LLM with our context. And the two ways I would like to explain are fine-tuning and retrieval augmented generation.
So one option is to fine-tune the large language model with our own context. There is some training involved. And it takes a while until the large language model is trained with our context. And after that, the LLM is ready to answer questions based on our private context. But we would need to do that all time whenever this private context is updated or changed. The other option that is available is retrieval augmented generation. And this works with a vector store. So we vectorize our context and manage the vectors. And manage our context in a vector store and retrieve out of the vector store. So we can have millions of documents in there. We retrieve from all these documents and from all that context the context that is required in order to answer our question. How it works? You will learn it. And you will do it in practice in a second.
So what is vector search? As you can see here in that graph, so we all learned that in school. And mainly two-dimensional and three-dimensional vector spaces. We also use that in the generative AI world and in that vector search world. And as you can see in the graph, we have objects like a trouser and a skirt. And they are closer to each other. In that vector search world, that means that both objects are more similar to each other than the trouser is to the t-shirt. So this is how vector search works. And therefore, a vector store has algorithms implemented to find the vectors that are similar to the query vector. And we are going to leverage that in order to find the right context that our chatbot needs to answer a question. So it looks like this on the left-hand side. There is a text chunk.
3. Vectorization and RAC Implementation
Short description:
Text chunks are vectorized using embedding models to generate semantic vectors. RAC implements retrieval augmented generation, combining vector queries with a large language model to provide accurate responses. AstraDB is a scalable vector store database based on open source technology. Implementing chatbots poses challenges due to dynamic frameworks and dependencies, which Rack Stack aims to address.
And this text chunk gets vectorized with the help of a machine learning model, a so-called embedding model. And the embedding model generates the vector. And as you can see, quite a lot of floating numbers. Typically, these vectors have multiple 100 dimensions, so a bit different from the ones at school, right? And captured within that vector is the semantic of the text on the left-hand side. As said, we use embedding models to generate the vectors. And input for that embedding model is objects, is text, is audio, is video. And the result is a vector with multiple 100 dimensions.
We are going to implement RAC, Retrieval Augmented Generation. So this is a pattern that you want to use as soon as you implement the chatbot that answers questions based on your context, on your private data. And the flow here is a simple one. We will implement it in our code. So there is a user query, a question, that goes into our chatbot. And the first thing the chatbot does is it leverages an embedding model to generate the vector for the question. And then we do a vector store query. So we use that vector and find the context, the vectors that we have stored in AstraDB, our vector store that are most similar to the query. And with this, we retrieve context, we re-rank our context, and provide the question together with the context to the large language model. And the large language model provides us with a response, with an answer. And this answer will be on point because DLLM has our private data.
A vector store is a database, our AstraDB vector store. The vector capability actually is a feature that we added. And so it is capable of managing lots of vectors. It can do a hybrid search. So that means the typical things you do with a database. So you do a query over some columns. But at the same time, you can do a similarity search to find the context that is most similar. You can do metadata filtering. You use the vector store also for chat history and for agent memory. The vector store that we are going to use today is AstraDB. And it is super scalable technology. It is based all on open source. Everything we have a data stacks actually is based on open source. And so it is based on Cassandra that is used at the majority of the large enterprises on this planet, for example at Apple. So they manage their business based on this technology. So it is super scalable, super fast, and high throughput.
But there are challenges when you implement such a chatbot. So it is really hard due to the fact maybe you have the same experience as my learning over the last months. So there is such a dynamic in this Gen AI space in all these frameworks that are available. So they change each day, different APIs, different interfaces, different versions. But as soon as you implement an application that is a bit more complex, yeah, sooner or later you struggle because there are incompatibilities between the frameworks and different versions of these dependencies. And this is with what enterprises and many of our customers struggle. And for sure, many of the things or everything nearly is open source. And enterprises, they also want to have support for what they use. And so data stacks, we have implemented a curated list of dependencies. We call that Rack Stack. And this is what we leverage in our code. So it is with Rack Stack. It packages a number of frameworks. We test it each day. Or whenever there is a new version or a change in data stack, so we have our pipelines for that, and ensure that the dependencies that are provided by a Rack Stack actually work together. And also Harrison Chase.
4. Implementing a Chatbot with Rack Stack
Short description:
Lengtjain is a leading framework for generative AI applications, packaged within Rack Stack. Developers have the choice between open source and closed stacks like OpenAI. The goal is to implement a working chatbot using the provided GitHub repository and explore using Langflow without coding.
I'm pretty sure you know Lengtjain, which is one of the leading frameworks for implementing generative AI applications. And he's the CEO. He also recognized that so this is something that is required. And it's super valuable for developers in order to implement code that has a high quality. Rack Stack packages a number of frameworks. Under these frameworks is Lengtjain. So everything that is available via Lengtjain, LAMA index, AstraDB dependencies, and Astra streaming. And yeah, it is the decision at the end of the developer, so which paths to follow on. If everything should be open source based.
And actually, it is easy to implement all that. But as soon as things go into production, and frameworks typically have vulnerabilities, you want to be sure that the vulnerability is fixed and you are on the latest code base. And for sure, beside open source, there is the option to use closed stacks, like OpenAI and other services that are available. So it is the decision of the developer, which paths to follow on. So with that, that was nearly all of the slides I wanted to present to you today. And now it is about starting the coding with a workshop. And everything that is required is on this GitHub, or in this GitHub repository. So please navigate to that repository. You can also use that barcode to get there. Yeah, and thanks for posting it in the chat. And we go from there.
So the goal is to implement a working chatbot today, but I will also demo to you how you can use open source Langflow, which is actually also packaged within Rackstack to implement your chatbot without a single line of code. So with this, I hope every one of you is already on that repository. So here is everything we need. So let me walk you through what we are going to do, and I will explain what we are going to do. And we do everything step-by-step. Signal me if anything is too slow, too boring. If we should touch on something, yeah. Please provide feedback over the course of implementing this chatbot.
5. Building a Streamlit Chatbot with Rackstack
Short description:
The build-your-own-Rack chatbot repository contains application iterations, each adding additional functionalities to create a fully functional chatbot with streaming capabilities. Streamlit is the framework used for implementing the UI, and we will be using the Streamlit platform to deploy our own application. Through the Readme, you will learn how to use Rackstack, vector store, AstroDB for semantic similarity search, and Longchain, a framework that brings together the necessary technologies for generative AI applications.
Okay, so this is the build-your-own-Rack chatbot repository. And we need to do a number of things. But first, I would like to explain that you see here a number of application iterations. So each of these up one to up seven adds additional functionalities in order to be a fully functional chatbot, a fully functional Rack chatbot that has streaming capabilities that you can use and upload your own context, your own PDF files, and use it at the end. This is about a Streamlit application. Streamlit is a framework that you can use to implement nice UIs. And Streamlit also has a platform that we are going to use today. I show you in a second.
So we, every one of you, if you want, can create an account today, not right now. We will do that later. And then you can deploy on top of that Streamlit platform your own application. And this is the application we are going to build today. This is our Streamlit application. And so I already uploaded some context via the browse files button. And so this is now hosted on the Streamlit platform and ready to take our questions. And the goal would be that each one of you reaches the same state today and every one of you have your own chatbot up and running.
First, I would like to walk you through the Readme. And please let me know if anything is unclear. So we use Rackstack and we create a streaming application that pretty much looks like that screenshot here. Then what you will learn today is how to use Rackstack, then how to use vector store, how to use AstroDB as a vector store for a semantic similarity search. You will learn how to use Longchain. Longchain is a framework that glues together a number of technologies that are required to implement generative AI applications.
6. Setting Up the Repository and AstroDB
Short description:
Datastacks provides an open-source package that includes generative AI frameworks for implementing your application. Dependencies include pypdf. To start, use the template provided in the repository to create a new GitHub repository. After creating the repository, you will gain access to services like AstroDB, OpenAI, and Streamlit. First, sign up for AstroDB by following the provided link and choose between Google or GitHub credentials. Ensure you have an RP endpoint and application token from AstroDB.
This is something provided by Datastacks and it is open source. And it packages all these generative AI frameworks that you need to implement your application with versions that work together. And the pypdf dependency is also required.
A Streamlit application, so we also need to define some keys and secrets, but this is something we are going to do together later. Okay, so the first thing you should do, and it is all documented within the README of that repository is you should use that template here, use this template and create a new repository. So that is the first thing.
So you need in your GitHub repository, this repository. So you will get a copy of it by using, use this template and create your repository. Then you will end up having this repository within your account on GitHub. So that is the first thing I would like to ask you to do. And let me know in case there are questions.
There were prerequisites. So you need the GitHub account for sure in order to create a copy of this repository. You do not do not click on the Google call up link here. So this is a Jupyter Notebook that you can revisit after this workshop, and it will help you to understand all the mechanics that you have learned today a bit better. But this is something you should have a look at afterwards. During the course, you will gain access to the following services. So to AstroDB, our vector database to OpenAI, the large language model and the embedding model is what we use from there, and also access to Streamlit. So you need access to the platform as soon as you want to deploy your application on Streamlit like I did.
Let's do things step by step and sign up for AstroDB. So that is the first thing you need to do. Just click here on that link. This brings you to the Astra UI. Let me log out of it. And yes, you will see a screen like that. What you need to do is you need to, if you don't have an account, sign up for it. And you can decide if you would like to use your Google credentials for that, your GitHub credentials for that, or if you would like to sign up with your first name, last name and email address. Just for information, some of you might struggle with the GitHub approach. If within your GitHub repo, there might be some security settings that do not allow to use your GitHub credentials to sign up for AstroDB, in that case, I would recommend to use the Google approach. So this, in order to sign up for AstroDB, so that is the first thing you need to do. And so what we need is a RP endpoint and application token within AstroDB. Let me see if I still have something open.
7. Setting Up AstroDB and Database
Short description:
This Jupyter Notebook will help you understand the mechanics better. Sign up for AstroDB and create a vector database. Generate an application token and copy the RP endpoint. Sign up for OpenAI and create an RPKey. Streamlit links are available.
So this is a Jupyter Notebook that you can revisit after this workshop, and it will help you to understand all the mechanics that you have learned today a bit better. But this is something you should have a look at afterwards. During the course, you will gain access to the following services. So to AstroDB, our vector database to OpenAI, the large language model and the embedding model is what we use from there, and also access to Streamlit. So you need access to the platform as soon as you want to deploy your application on Streamlit like I did.
Okay, let's do things step by step and sign up for AstroDB. So that is the first thing you need to do. Just click here on that link. This brings you to the Astra UI. Let me log out of it. And yes, you will see a screen like that. What you need to do is you need to, if you don't have an account, sign up for it. And you can decide if you would like to use your Google credentials for that, your GitHub credentials for that, or if you would like to sign up with your first name, last name and email address. Just for information, some of you might struggle with the GitHub approach. If within your GitHub repo, there might be some security settings that do not allow to use your GitHub credentials to sign up for AstroDB, in that case, I would recommend to use the Google approach. So this, in order to sign up for AstroDB, so that is the first thing you need to do. And so what we need is a RP endpoint and application token within AstroDB. Let me see if I still have something open.
No, but I'll show you again how it works. Open link in a new tab. Sign in with Google. Let's switch to my organization. So, and for today, I've created a workshop database. So you need to create a vector database in AstroDB. And as soon this database is created, here on the right-hand side, you will see an RP endpoint and application token. So you can generate application token here and copy it. So this is what we will need over the course of this workshop. So you can do that right now and copy it into a text file or wherever you want to store it. So this, for AstroDB. Yeah, there's a question, is provider of database irrelevant? Yes, so this workshop is for AstroDB, for using AstroDB. Okay, so there's the first thing, then open AI. Please, if you do not have an account yet, please sign up and navigate to the RPKey page and create an RPKey. So, AstroDB is a database service available in any public cloud. So it is your decision where you want to create your vector database. You need to create a vector database. It can be on AWS and Google and Azure. I would recommend you can use whatever you want, but you can use the AWS. So there is a European region. So I believe we are Europeans within the workshop. From that perspective, it makes sense. Your databases initialize. So this takes typically one to two minutes. And once the database has the status active, like my workshop database here, yeah, we are ready to generate application tokens and copy the RP endpoint. Yeah, sure. So I can repeat. So the, here in the prerequisites, you need to have three accounts for these three services, one for Data Stacks AstroDB. So behind these links is everything you need to sign up for these services. Then for OpenAI and for Streamlit. Okay, then let's continue. Yeah, Streamlit, as I said, the links are there.
8. Implementing the First Application with Streamlit
Short description:
Create code spaces on the main branch to provide a complete development and execution environment for the chatbot application. Use the code spaces terminal window to install the necessary dependencies. Implement the first application by drawing a UI with Streamlit.
They should directly bring you to the page where you can register for the service. Okay, and so this is what we do not do. We do not try the concepts in a Colab notebook. So this is a Jupyter notebook executed in Google's Colab environment. Do that afterwards after the workshop to learn more about the mechanics of what we have done in the workshop.
The next thing that we are going to do is, so this is what you already did. So you created your own repository based on this repository. Then the next thing is that we create code spaces on the main branch. So please do the following. Here, there is the code button. And I have already my code spaces up and running and active. But you will see here, create code spaces and do exactly what is within the documentation of that workshop. Create your code spaces based on that. And this will give you the GitHub code space environment. And the advantage for us here is that nothing of all that code that we implement is executed locally on your machine. So there is no risk of incompatibilities with your machine or with the setup. So this provides us with a complete development environment and with the execution environment for our chatbot application. And as soon as you have created your code spaces, you should end up seeing pretty much the same as I see right now on the left hand side, all the files and folders we have within our repository. Then on the right-hand side, on the right-hand side, so that is just a test page that starts a browser within the Streamlit starter example. And yeah, at the bottom, we have a terminal. A terminal, right now, there is this command line up and running. As soon as you started it, Streamlit hello server and so on and so forth is executed automatically. So this brings this browser window up. And so we will use that terminal window to start our application iterations. And to stop our applications, just hit Control C. So that is what I did. And with this, I can execute all these applications that we have within the repository. When you are blocked and cannot do anything in the environment, that is due to that pop-up window here. So now that it's gone, I can delete this line and insert my own command lines. Is code spaces already up and running everywhere? Please let me know if you need a few seconds more or one minute more, just ping on the Zoom chat. Okay, looks good. So with this, we have the repository. We have the services registered. We have code spaces up and running, and let's go from there. So code spaces is up and running, and now you're ready to rock and roll. Now we are ready to implement our first application here. Yeah, so the code spaces provides us with a Python enabled environment. So it is right now Python 3.11, but it is not really that much relevant. And as said here, you need to do the PIP install for Rack Stack, for Streamlit, for PyPDF. And the way you do that is what I showed before. I show it again to be sure you can follow. So this is in the requirements text. So this is the command line that you need to execute in the code spaces terminal window right here. So right here, you need to do your PIP3 install requirements.text. With this, you have everything, all the dependencies for this workshop. Okay, so let's go back to the instructions. Yeah, so now that we have the basics, we are ready to implement our first application. And the first application is just about drawing a nice UI with the help of Streamlit. So we draw a title, we draw some text with markdown. And that is all it is for the first iteration.
9. Running and Testing the Streamlit Application
Short description:
Run the Streamlit application by executing the Python code. Verify that the setup is working by drawing a UI with a title and markup. Stop and start the application to ensure the connection between the browser and the backend Python code is established.
And in order to run our application, just copy Streamlit run up one.py with that icon here on the right-hand side and execute it in the terminal here. And the result of the Python code is up and running. So that is our Streamlit application, a Python application. And we used some of the features in order to draw a nice title and some markup. So let's wait a second till everyone has the PIP install done, maybe. Meanwhile, also here on the left-hand side, you can navigate through all the files within the repository.
So that is our first application, a simple one, but it is just to draw the UI and to see that the whole setup is working for you. So the simple browser should be automatically come up as soon as you start your application. So with a control C, you stop your application and with the arrow upwards, I can get the command line back again. As you can see, now that I stopped my application, there is no connection to the application, there is no connection between what we have in the browser and our backend application, our Python code. And we close the pop-up and start the application again. And actually this brings up the browser automatically.
10. Adding Input Fields and OpenAI Integration
Short description:
Add input fields to the application to ask and display questions. Use Streamlit session state component to store chat bot interactions. Integrate the application with OpenAI chat models using prompt templates and the chat open AI component.
So let's go to application iteration two and this PIP install, you do not need to do it anymore because we installed all of the dependencies via the requirements text. So now let's add to just drawing text some input fields. So we need input fields in order to ask questions and we also need output for our question. And the way we achieve that is with stream, still with Streamlit only. So we leverage the chat input function and yeah, so this allows us to ask a question and we draw the question and we also draw a answer.
So right now there is no connection to any LLM, so it is just still UI, but we will extend the application with each of these steps. All we need to do is to copy the Streamlit command line again, stop what we have up and running, start application two, and now you can see we have a input field, what's up is the question and now we can ask a question. But the only thing the application is doing is to print out what the question was. And it doesn't have any history yet. But we come to that later.
So let's stop that application again. Let's go back to our instructions. And yeah, in order to remember the chat bot interactions, we use Streamlit session state component. So a Streamlit application works like this. So it is a Python script and so it is executed from the top to the bottom. But the next time it is executed, we do not store any state or default. In order to store state, we leverage components like the session state component to store the question and the answer that we got so far. And we will have multiple questions and it is for our benefit to have the whole history of that. So let's copy Streamlit app three. We can also have a look here at the code in GitHub. And yeah, so this is what we already have spoken about, the input field for the question and then there is the session state component. And we append our messages to our session state in order to have that accessible over each of the Python script executions. So let's execute our app three. And now I add a test one. Add a test two. Add a test three. Okay, so we have the functionality to store the state implemented.
Let's go to the instructions and let's integrate our application now with OpenAI, with the chat models available from OpenAI. And so what we use here is, we use hash data that is also a Streamlit component in order to store our prompt template that we leverage in our chatbot application. The prompt consists of a template and in a prompt, you define what the large language model should do. You define what the role of the large language model should be, how the LLM should behave. And in this case, it is, you are a helpful AI assistant tasked to answer the user's question. You are friendly and you answer extensively with multiple sentences. You prefer to use bullet points. Then within the prompt, there is a placeholder and this is where the question goes in. The whole prompt is what we sent over to the LLM. So we send over the question, we send over instructions and as you will see later, we also send over context about our private data.
Okay, so with this, we have our chat prompt template and another thing we need to do is we load the chat model. And therefore we use from LungChain, the framework, the chat open AI component and this component requires us to provide some properties. So the first thing, the property, important property is the model that we wanna use from open AI. So you might know that open AI offers a number of models. We use GBT 3.5 turbo. There is also GPT 4.0 and other large language models that we can use. But for this workshop, we use this one then there is a temperature and you might wonder, hey, what is this about this temperature? So you can give the temperature number between zero and one zero means that you do not give the LLM any freedom to come up with a response. So the response should be really on point and the LLM shouldn't hallucinate too much and shouldn't generate a random response. The higher the number is, so if it's one, then you allow the LLM to provide you with a more random response. And it might be the case that the response you get from the LLM is wrong or is not to the point, right? So the lower the number, the more accurate the response will be. And then we enable streaming due to the fact the LLM, so it is a large language model. It is a machine learning model and what LLM does, it does a prediction or tokens. So we send over a prompt and with this prompt, the LLM generates a response, but it predicts the response token by token in order to see the response as it is generated, we wanna stream it in our UI and that is why we set it to true and we will leverage that later when we implement the streaming functionality in our chatbot.
11. Defining Chain and Executing Application Four
Short description:
Define a chain to connect input, prompt, and chat model. Provide open AI RP key in secrets.toml. Execute application four to connect to the large language model and ask questions.
Okay, and then the last thing that we need to do here is we define a so-called chain and this is the long chain expression language and you read it like that. So the input goes into a prompt. So the input is the question right now and this goes into the placeholder that you have seen in the prompt and the prompt goes into the chat model and the chat model will provide us with an answer. So this is how you read that chain here and the response is then coming back in this response variable and out of that response, we take the answer and draw the answer within our chatbot application.
So before we can execute application four, you need to provide the open AI RP key in the streamlet secrets Toml file. Let me show you how you do that. So within the repo, you find the secrets Toml examples so what you can do is you can copy that file and create a secrets.toml file like I did with my secrets, with my opening RP key and with my Astra credentials or you can just rename it. If you do a right click on that file, you simply can rename it to secrets.toml if you like. So these are the approaches you have and for application four, all you need to specify right now is the open AI RP key. I have done that in my secrets Toml and with this, I can execute the application four. So what we have now is a full-blown chat application that is connected to the large language model and yeah, we can now ask question. How is it going?
12. Using RP Key and Astra DB for Vectorized Context
Short description:
Provide an RP key if needed. Use Astra DB to store and retrieve vectorized context for the large language model. Astra DB allows for vector search and similarity scoring.
So what I could do is, if it's only a few folks experiencing the same, so I could provide you with an RP key that works. So I will do that. Eugene in a second and this RP key is active till tomorrow for you. And then I will disable it because it uses our data stacks credit card. And there was a question about, the vector database is still pending. Oleg said that my vector DB is still pending, probably would need the key as well. You do not need a key for that. So it should go into active state. Yeah, so once the database is active, for sure. Yeah, you can copy the endpoint and generate a token and we are going to use it in a later step in the workshop. So does it work?
So what I demonstrated, the integration with OpenAI, any successes so far? Please signal if it works. Okay, perfect. Then let's continue with the next step, so that was application four. So right now all the responses are based on, the data, that the large language model was trained on, but as I said, the LLM was not trained with our private data. And in order to provide or bring our private data into the picture, we are going to leverage Astra DB, our vector store and vectorize our context, store it there, retrieve it out of Astra DB and provide it within the prompt as context to the LLM. So yeah, maybe quickly to Astra DB.
So this is what you should see right now in your Astra DB UI. So you created the database and yeah, we spoke about endpoint and token. So these are metrics about throughput and other things. And then there is a tab called data explorer. There you can see a namespace. We work with a default key space. This is the name of the default namespace. This is what we leverage for now. You can create your own namespaces, but not necessary for us. So we have the namespaces, but not necessary for that workshop. And within the namespace, we have collections. In these collections, we store our data, we store our context. And as you can see here, so my context is stored in a collection called length flow. There is also information about the dimensions of the vector dimensions. 1536 dimensions is quite a lot. And as I said before, captured within these dimensions is the semantic, is the context of the text that was vectorized. And then when we go down, there is the collection data, there are the documents, the vectors are stored side-by-side with the content. So this is the text that was sent or that we sent over to the OpenAI embedding model in the next workshop iteration. And that gets then vectorized. And the result of that vectorization is this vector. And we store both side-by-side. And this allows us to do a vector search. So I can do a search right now within the AstraDB UI to do a similarity search. And then as you can see, yeah, there is a similarity score of one. There are different algorithms available or a similarity search. There is cosine, dot product, Euclidean, and there are other algorithms. We use the default in this workshop, we use cosine. And the cosine similarity score of one means that there is a 100% similarity within all the contexts that we have in the database. And for sure the reason is we used the vector, one of these documents that contains the context in our database. And if you compare it with itself, it is completely similar. So that is what we get as a response here. And here on the left-hand side, we can see the similarity to all these other text chunks that we are going to store. So we haven't done that. We are going to store in AstraDB.
13. Combining AstraDB with Retriever and Streaming
Short description:
Combine AstraDB vector store with additional context using a retriever. Execute application five to ask questions and retrieve similar context chunks from AstraDB. Upload PDF files to integrate private data into the chatbot. Implement streaming of LLM response for a better user experience.
We are going to store in AstraDB. Okay, so then let's go to the next section. Now we combine AstraDB vector store with some additional context. This additional context goes into the prompt. The code that we need for that is we need to implement a retriever. So the retriever is used in order to retrieve data out of the AstraDB vector store. And so it is also a long chain component. We use AstraDB for that. You need to define a embedding model that you want to use for that. We use in this workshop open AI embedding models, but you could also decide to go for a hugging face embedding model or embedding model provided by Azure AWS, by other vendors. For now, we use open AI. Then you define a collection name per default. Go with the default with my store. And then you can see the placeholder, Astra API endpoint and Astra token. So this is what you provide in a second in this green lit secret toml file.
Then here, the retriever for the chat model. So what we do here is we define that we would like to get back the top five most similar documents out of the collection. And then what we do in addition, so before this inputs, it was only about the question. And now we add to the inputs also the context. So these are the placeholders you have seen in the template. And this information will be ingested into the template. So one is the context, the similar documents out of AstraDB, and the other one is the question. So then please add your endpoint and token to your secret toml file. And with this, we are ready to execute application five. Yeah, so it needs to go here. I've done that in my secret toml. That is why I can directly execute the application five. And it doesn't look different than to what we have seen before, but now, yeah. So we can ask questions and these questions, the question gets vectorized. And this questions vector is used as a search parameter to find the most similar context chunks within AstraDB. And this is then provided within the prompt to the LLM. But right now we do not have a capability to upload any context from us. So we do not have a big benefit yet, but we, what we did is we integrated AstraDB so that it ingests the context into the prompt. So Eugene asked, so what data do I need to load in AstraDB? So we will add a new feature to our chat bot application that allows you to upload a PDF file. And it can be any PDF file. So this is only for demonstration purposes, for learning purposes. But the idea of such a reg chat bot is to have a chat bot for your private data. Maybe within your enterprise, within your company, you have products and you have services. You have descriptions for all that. And you would like to have a customer facing chat bot that is capable of answering questions about the services you offer and the products that you offer. And then what you do is you upload these PDF documents into the chat bot. And with this, they get split into chunks. And each of these text chunks goes through the embedding model in order to get vectorized. And the vector and the text chunk gets stored side by side in AstraDB in the collection as demonstrated a few minutes ago.
Okay, so let's continue with making things a streaming application. So as I explained before, we would like to have the LLM response streamed into our user interface because the LLM does a prediction of the next token. And sometimes it can take quite a long time and it's not a nice user experience. If the user asks a question and has to wait five seconds in order to get the complete response, we would like to get the response streamed into our UI as soon as the first token is predicted. And therefore, we add this streaming callback handler into our application code. And we also need to add that stream handler to our chain in order to integrate it with the LLM.
14. Adding PDF File Upload Functionality
Short description:
Skip application six and go directly to adding PDF file upload functionality. Use Streamlit sidebar to browse and upload PDF files for vectorization. Split the document into chunks using the recursive character text splitter and store the context in AstroDB. Execute application 7 to test the chatbot's ability to answer questions about uploaded PDFs. Reach out to DataStax for assistance with your projects and bring your application to the Streamlit platform.
So this is the application six iteration. And you know what? I would like to skip that and do not execute it. Let's go directly to the final thing and add the functionality to upload a PDF file. Oleg has a question. Did I provide you with an API key? Yeah, okay, Oleg will send you an API key in a second. Okay, there you go. So let's continue with the final step. As I said, our JetBot should allow us to upload a PDF file. So what we need to do is we need to use or define a Streamlit sidebar. And within the sidebar, we add our upload button that provides us with a window where we can reference our PDF file and upload that file in order to get it vectorized.
So here is a bit of code. In order to figure out what the path is to the PDF, right? And so there are also, there's a Pi PDF loader that we use. We provide the path to the file. And then another interesting thing that we leverage is a long chain component, the recursive character text splitter. The recursive character text splitter on this component takes our document, splits our document into chunks of a size of 1500 characters. And you can also see there is another property. And this property is the overlapping. And so we overlap our text chunks in order not to lose any information and to have complete sentences within our context. And then the next thing is that we split our document with a text splitter. And then the next thing is that we store. So that is a reference to the AstroDB vector store. We simply execute the method and documents and provide the pages from the text splitter. And yeah, so that is our context that gets stored in AstroDB. And with this, we will have context. And if you upload then one of your product description, service description, your chatbot is capable of answering questions about that. And there is no need to do a pip install pyPDF because all of you, I believe, did a pip install via requirements.txt.
So let's execute application 7, but not here. That is the wrong place. So it is here. So now you should have the sidebar with the button browse files. And with this, I select, in my case, you can select whatever you wanna select, any PDF that you have available. It is an essay of an author. I load it. I save to AstroDB. And once you saved it to AstroDB, you can ask questions and test if the LLM really gets the context about the PDF. And so this PDF is about, is an essay. And my question is, so what were the two main things the author worked on before college? The LLM, they does not know about which author is that all about. And it wasn't trained with that essay PDF document. But now that we have vectorized it and we inject it into the prompt, the context and pass it over to the LLM. So we get the two main things that are writing and programming with some more information back from the LLM. Please let me know if that worked for you, any successes with that? Hey, cool, very cool. So it worked for quite a number of folks. And this is really a super simple chatbot application. So there are many ways to make it even more valuable and more production ready for sure. And if you are interested in, if you have a project, yeah, I would be glad or we would be glad DataStax to simply reach out to me, to us. Yeah, we would love to give you a hand with whatever you wanna do. So there is a last step within the workshop. And so this is to bring your application up and running on the Streamlit platform. So I hope each of you got Streamlit account. So it is not only the framework, it is also a platform.
15. Deploying the Chatbot Application
Short description:
Create an application in Streamlit to deploy the workshop chatbot, making it publicly accessible. Customize the URL, copy the secrets from the secrets.toml file, and hit Save and Deploy to deploy the application. Use pipelines to update the vector database with new documents. Monitor the documents folder for changes and vectorize the new documents to update the information in Astra DB. Use the Astra API and the Python client to manage collections, delete outdated information, and interact with the database.
And so all you need to do actually is, and it's also well-documented, you need to create a application. Right now I have two applications up and running. So this application is the workshop chatbot that I already deployed to Streamlit. So with this, your chatbot is also publicly available. So this would allow you to share it with colleagues, maybe colleagues that work also in home office, work in other locations to give them an idea. So what is possible with retrieval augmented generation. All you need to do is you create an application. And so in my case, I already have an application. And so what you provide here is the repository, the repository, then it is the main branch, then the main file paths to your application. In our case, it is app underline7.py. And then here is a URL and you can customize your URL to make it to the URL that is available and matches with whatever you want to name it, right?
And then there are some advanced settings and what you need to do there is, all you need to do is you need to copy from the secrets.toml file the whole thing here. So all the lines, all the secrets and copy it in here. And then you hit Save. And then you hit Deploy and this will deploy the application on Streamlit and it will take a while, but afterwards it is accessible like my application. Now, please provide feedback if that also works for you, would be good to know if that worked for you. And with this, actually please signal if it works and please also let me know if there are any questions. And I'm happy to answer any of them. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you and I'm happy if there are any questions. With this, once you signal me it works, I would like to demo to you within a few minutes, even simpler, more simple way to implement such a chatbot application.
I do have a question. So let's say that in my company I have a list of files that I want to make public for my colleagues. And I have this application hosted for them to ask questions to these files. As the files change, how easy and how does the process look like to keep on feeding the vector database with more files? Like do you have certain APIs or certain pipelines where we can streamline the information that we feed to it?
Yeah, thanks a lot for the question. So the way you do it indeed is by pipelines. So this is nothing that we offer right now in our platform. So this is something you need to implement on your own. As soon as there is a document change, a new version of a document or a new document, typically you monitor your documents folder. And if there is something new, if there is an update, you take the new document, vectorize it and update the information in Astra DB. And I imagine that is also in the Astra DB possible to delete all information that is no longer relevant, given that I just uploaded a new PDF to all the one as wrong information.
Absolutely. So what we have leveraged today is the so-called, there are always different ways to interact with the technology. And we use the Astra API. There is for different programming languages, a client that implements the API. Today we use the Python client. And with this, why are these Python client? There are a number of functionalities available to manage your collection, to delete your collection, to delete outdated information within your collection. So this is doable via the API in this case. Okay, I can see a question about the end point and token. Shibam, let me quickly circle back to that point. Here is Astra. So as soon as you have your database up and running and it's active, so what do you do is here on the right-hand side, there is an RP endpoint. So this is what you copy. And this is what required in the Streamlit secrets.
16. Implementing Langflow and Defining Flows
Short description:
Generate a token and paste it into the secret file. Langflow is an open source project that allows you to implement generative applications without coding. Use drag-and-drop components in a canvas to define the flow. Connect components like embedding models and AstraDB search to perform similarity searches. Retrieve context and use a prompt with predefined templates for generating responses.
And then you generate a token simply by clicking generate token, you hit the copy icon and copy and paste it into the secret file. And that is all you need to do. Okay, with this, okay, with this, I believe the next thing is also super interesting to you. So it is about Langflow, which is an open source project. And yeah, Langflow, it has nearly 20,000 stars on GitHub. So it is really a famous project with a super interactive community. It is a no code way to implement any kind of generative applications. If you are interested to dig deeper into that topic, it allows you to simply prototype to try out different models, different retrievers, different tech splitters, different technologies from AWS, from Azure, from Google, just by exchanging components in a canvas. This is the doc page, docs.langflow.org.
And so what I'm going to do is I demo it quickly to you. So I have Langflow up and running and additional info to you. So Rackstack is what we have used today in order to get a Langflow, the Langflow framework into our chatbot application. Rackstack is about a curated list of dependencies and Rackstack also provides Longchain to you. So there is Rackstack dependency called Rackstack AI Langflow. So you haven't seen that yet today in the slides, just for your information, you can also Google for it, you will find it. So this will set up everything for you so that you can just execute a Langflow run. And what you will get is the Langflow UI that I have here. So that is the UI of Langflow. And so you work like this. There are already quite a number of templates that you can reuse. So you do not need to invent the wheel by your own. So you can leverage what your colleagues, what other people on this planet have created and reuse it. But for this demo, I start with a Langflow. So we implement flows in Langflow. And as you can see on the left-hand side, so what do you drag and drop into the canvas are components and for a chat application, I drag and drop a chat input field into my canvas. And yeah, like in our application code, but with a difference, I do not need to understand Python, I do not need to understand how these components actually work. I just drop them into the canvas and connect them with each other to define my flow. So we need embedding model like in our chatbot application, also here, let's use the OpenAI embedding model. We also do vector search over what we have as context within AstraDB. These are the components that we need, and in order to define a flow, you simply connect these components with each other. Like I do it here, the embeddings go in here. And so this is already all you need to do a similarity search based on the cosine algorithm, based on a question that I can type in here. Type in here, let's use the question that I have demoed before. What are the things that I already worked on before college, right? So this question is then vectorized by the OpenAI embedding model. And with that vector, we have a search parameter that is used for the vector search over all our context we have in AstraDB. And AstraDB search, this component will come back with the context. And for sure, there are options to define more parameters. As you can see here, there are a few buttons first. If you would like to dig into the code, you can do, and you can even extend components. If they are not, like you would like to have them, and there are some other options here. There are advanced options. You can add some more of these properties to specify in more detail how this AstraDB search component should behave.
So now that we have done a vector search, we retrieve context. And as we have learned before, this goes into a prompt. And the next thing we do is we take the prompt from the left-hand side, drop it in here. The prompt, as before, comes with a template, and I have predefined a template here that I just copy and paste into the UI. As you can see, it looks a bit like what we have in our chatbot Python application. So there's a placeholder for the context. So this is where everything we got from AstraDB goes in and then there is a placeholder for the question. And as before, there is also instruction what the LLM should do.
17. Connecting and Integrating Flows
Short description:
Connect the context and prompt with the question and large language model. Share and modify flows. Provide API keys and AstraDB credentials. Execute and validate the flow. Integrate the flow into existing applications. Workshop conclusion and invitation to React Summit.
So with this, we save it. And now we can connect the context with the right endpoint here of the prompt and also the chat input, the question. We connect it here with the question. We have these two placeholders, and with this flow, they get injected with the data. So that is the prompt. The next thing we need is then, so now we have the prompt with all the information that is required, and we pass it over to the LLM. As you can see here, so there is quite a big list of different technologies, different large language models. You can select from all of them in order to compare them with each other, to figure out which one of these is the best one for your use case. Also here, I go with OpenAI. Drag and drop it here, connect the prompt with the input for the large language model. And the last component that is required actually is the chat output. It's here. We connect the response from the LLM with the chat output. That is all we need. And the cool thing is, so these flows, you can share these flows with others. Actually, as you can see, there is a store available. So you can register for that store, and there are other people on this planet. They created already their flows, and you can take these flows, modify these flows to your needs. And for sure, you can also learn from these flows. As before, well, we had this secrets file. Also here, we need to provide for OpenAI for the embedding model, the OpenAI API key. And also here, the API key. Then we need to provide the AstraDB credentials. So there is a token required that goes in here. And then there is the endpoint required that goes in this field. And yeah, the collection is already created. In my case, it is called length flow. And that is actually all we need to do in order to have a chat application. So the next thing we do is we execute that flow in order to validate that everything is correct. And as soon as we have everywhere check, we can be sure that the flow is correct. And now let's use the UI here. We could use the UI here to ask our questions and in order to get our responses. And as I said before, so this is something you can share with your colleagues and you can also integrate it in your existing applications. So you can integrate it by an API. So there is also, you can integrate the whole flow into your application. And then there is no need to work with all these long chain components with all these classes and stick them together. So it is much less error prone and you can experiment and get things even faster into production. So actually with this, we are close to finish. Our workshop today. Hope you found it valuable. Was it valuable to you? Can you give me a signal? If it was valuable, many thanks for the feedback. That's great. Can you please share your long chain? Do you mean lunch chain or length flow? By the way, length flow uses long chain under the hood. So all these components that we have dealt with today in the code are used behind these components. Hey, thanks a lot for this feedback. If there are any questions, please feel free to reach out anytime. Always happy to discuss technical products, challenges, use cases and help, in case you are interested in. So this React Summit is next week and Data Stacks has a boost there. Hey, it would be great if you are there, if you are on site, please visit us at the booth. With this, many thanks for the interaction. It was fun. And yeah. Hope to see you soon. Bye.
Comments