Video Summary and Transcription
Today's workshop introduced the concept of LLMs and their potential to free up time for software engineers. It discussed setting up LLMs to chat with proprietary data, utilizing retrieval augmented generation for small chats, and building a speaker recommendation application using this approach. The workshop also addressed concerns about running data on OpenAI and explored the process of splitting and storing data in a vector database. It concluded with the deployment of an end-to-end application using Genesio and invited attendees to provide feedback and stay in touch.
1. Introduction to LLMs and Generative AI
Today we are going to talk about LLMs and how they can free up time for software engineers to focus on product development and robust architectures.
Well, hello, everyone. So I'm Andrea. I'm working at Genesio and I'm so happy to be here and I'm hyped to talk about LLMs and generative AI because it's a pretty interesting subject right now.
Today we are going to talk about LLMs and I am pretty sure that you are also hyped about the topic because you are here at the workshop where you can actually learn how to integrate LLMs, OpenAI, and AI models in general into your own applications.
So I am actually pretty passionate about this subject because OpenAI is actually giving us, and LLMs in general, are actually giving us the gift of time. So I'm going to be honest, I have a few dozen conversations with ChatGPT right now and it fairly saved me a lot of time with debugging and writing code. And now I can actually focus more on learning, on designing architectures, on stuff that ChatGPT cannot do but I can do and I have more time to do it. So actually I'm pretty hyped about this topic because I want to see LLMs and AI models more embedded into our work as software engineers so we can actually free our time to think about products and to think about robust architectures and so on.
2. Setting up LLMs to Chat with Proprietary Data
Today I will show you how to set up an LLM, such as OpenAI, to chat with your own data. One challenge is that if you have proprietary data, the model won't know how to respond. To overcome this, we can provide the proprietary data and context to the model. There are different approaches, including fine-tuning the model with proprietary data, but this is expensive and requires expertise. Another approach is to include all the proprietary data in the prompt itself.
So with that being said and with that in mind, what I want to show you today is how can you actually set up an LLM, such as OpenAI, to chat with your own data.
So, okay. So usually up until now, how we are communicating with LLM models such as OpenAI or LLAMA or any other model, we are just putting some questions, we have a user, he has a question and the model is going to respond to us. But the caveat here is that if we have some proprietary data, the model, unfortunately, is not trained on that data and it won't know how to respond to your question.
So I saw these days a very clear example about this. So, for example, if you want to ask about some policies from your company, so, for example, the vacation days that you have, you cannot ask a model. You can actually have to go to the internal guidelines and policies in your company or to the HR and you have to ask the person and you have to spend time on all of this back and forth. So what we actually can do is we have a way to give the proprietary data to and give a context to the model in order to help us to ask these kind of questions.
So now there are a few more, there are a few approaches that we can actually take. First of all, I want to emphasize when I'm saying proprietary data, some of you probably go directly with the thought to privacy concerns. So there are two things that you can do here. If you are concerned about the privacy of your data, you can actually use a model that you are hosting yourself. So, for example, you can get Lama three models which are open source. You can host them on any cloud provider and you are then sure that your data never leaves this whole environment in this whole architecture. So you have totally control and total privacy for this workshop. I use the open AI just for the convenience because it's already public. It's already there and I don't have to spend time to set it up. But keep it in mind, if you want total privacy, you can host your own model. But not all proprietary data is also sensitive data. So we can actually have a public documentation for an open source project and things like that, that we just can feed to a third party model. So you don't have to worry about that all the time.
So getting back to the presentation and to the approach itself, the first thing that comes to mind is that we can fine tune the model with the proprietary data. But unfortunately, although this is the best thing that we can do, because then the model will natively know the things about the data, this is very expensive and requires machine learning expertise. So fine tuning is actually an art and you have to know how to do it in order to do it right. And you also have to have a lot of proprietary data because otherwise, if you don't have a lot of data about the subject, you won't really make a difference in the model itself because the model is huge. It knows a lot of data. So if I'm just adding a few sentences about a certain topic, it will just get lost in all of the data that is already there. So for this kind of application, fine tuning, it might not really be a solution. So we can actually go and do the naive brute force approach and we can put all the proprietary data into the prompt itself. So before actually asking a question, we can say to the model, here is all the data from my company, all the guidelines or all the policies.
3. Retrieval Augmented Generation for Small Chats
To overcome the problem of long proprietary data, we can store it in specialized storage and retrieve only the relevant paragraphs for our questions. This approach, known as retrieval augmented generation, is ideal for small applications that require new information without fine-tuning the model.
Just parse this data and then answer my question. How many vacation days can I take into this company? The problem with this approach is that sometimes the proprietary data is so long that it will exceed the maximum length of a prompt. So each model has a maximum length of a prompt and we cannot really exceed that. And also the prompt is getting more expensive and expensive as directly proportionally with the prompt length.
So if the proprietary data is long, is short enough to be contained in a prompt, still we are going to pay a lot of money. We are going to pay more for data that is not relevant to our question. So, for example, if I want to know about vacation days, I might not be interested in any other part of the policies or guidelines from my company that talk about other kind of perks or other kind of benefits. So I can just leave them on the side and I can just tell the prompt and the model just about the relevant data that I'm using.
So what we actually want to do is the following. We have the proprietary data and we want to store it in some kind of smart, specialized storage. And we just want, based on our question, going back to the example, the vacation days. We want to just take the paragraphs about vacation, about days, the paragraphs that are logically related to my question. And we are going to give this relevant data into the prompt, just this data and then our question and we are going to fit it into the model. So this whole architecture is called in the AI ML ecosystem, is called retrieval augmented generation. And it's actually the best the best approach for small applications, small chats that require to know new information like documentation policies, guidelines, and they don't really need the power of fine tuning the model.
Building a Speaker Recommendation Application
We will create a speaker recommendation application using retrieval augmented generation. By scraping data from the C3 festival website, we will retrieve the top three speakers based on your interests. The application will consist of a backend service communicating with an open AI model and a UI for inputting interests and retrieving recommendations. The stack will be Node.js, React, and TypeScript.
And we are going to give this relevant data into the prompt, just this data and then our question and we are going to fit it into the model. So this whole architecture is called in the AI ML ecosystem, is called retrieval augmented generation. And it's actually the best approach for small applications, small chats that require to know new information like documentation policies, guidelines, and they don't really need the power of fine tuning the model.
So this is what we are going to see today in the workshop is how to create such as a retrieval augmented generation application. And let's see exactly how and what we are going to build, because I wanted to give you some kind of use case that is very relevant to this workshop and relevant to our everyday use, because we want to do applications that are relevant to us, not just toy examples.
So actually, we are going to create a speaker recommendation application. So what I actually did is I went to the C3 festival website and I retrieved, I got, I scraped basically all the speaker and talks that are going to happen these days. And based on this information and on your interests, we are going to retrieve the top three speakers that you must see these days because you are interested or you are passionate about the talks that they are giving. So we are going to do this full stack. Basically, we are going to have a backend service that is going to talk back and forth with an open AI model. And we are going also going to have a UI or a client frontend interface where we can actually put in our interest, our name, and we can hit send and so on.
And the technical stack is going to be Node.js and React. Basically, I'm focusing on TypeScript because it's my go-to language when I'm scripting such things. And I'm going to use open AI just for convenience. You can put it there, any other model that you want to host or you want to try from Hugging Face or from any other hosting or services that are giving models. I will get a bit. I will make a pause here because I see that I have two questions.
Addressing Concerns and Discussing Retrievers
To address concerns about running data on open AI, companies can use other models or host their own. We will discuss the architecture of retrieval augmented generation and how it retrieves context for user questions. The different versions of retrievers will also be explained.
So the first question is, how about companies that don't want their data to be run on open AI? Yes. So for this question is that you can use any other model. You can, as I said, get a third party, get another third party model that is already hosting AI models, LLMs, or you can actually host yourself. So there are a lot of tutorials right now on how to host either locally or on cloud providers such as AWS, Google and so on, where you can just get the GPUs and you can host the model itself. So you don't really have to use a third party company to ask questions.
And the second question is, there are two types of retriever, one and two, which one is better, which one we will be discussing today. So I'm not really sure what what do you mean by version one and two? Maybe if you can give me a few examples in the chat, I can tell you exactly what we are going to talk about today. But as I said here. We are going to discuss this kind of architecture where we have an embedding storage, we will see in a moment what is that, where we are going to retrieve the context that is valuable for our question and we are going to concatenate both the relevant data and the user question and we are going to give it to the model. So I don't really know if this is the version one or the second one of a rack system.
Building the Speaker Recommendation Application
We will build a front end where we can input our name and interests. The architecture involves passing the input through a React user interface and storage holding C3 Festival information before creating the prompt for OpenAI.
Okay, great. Thank you for the question. It is great to see that you are also participating in the workshop.
So as a sneak peek, this is what we are going to actually build. This is the final UI, the final front end here. We can input our name and our interests, and when we click send, we will get back the speaker recommendation and the talks we should see at the C3 Festival.
Let's see a bit of the architecture before diving into the code itself. We will input interests through a React user interface. Then we will pass it through the storage that holds all the information about C3 Festival speakers and talks. Finally, we will create the prompt and give it to OpenAI.
Demo and Deployment with Genesio
We will pass the input through storage, create a prompt for OpenAI, and then proceed to the demo. You can code along by accessing the repository and setting up the environment in Gitpod. We will utilize the serverless platform Genesio for deployment, ensuring a fast and bug-free coding experience. By logging in with our Genesio account, we can open a local server for development and testing.
Then we are going to pass it through the storage that we are just going to talk about in a moment. This storage is holding all the information about C3 Festival speakers and talks. And then we are going to create the prompt and give it to OpenAI.
Now let's actually see the demo or the workshop part of this talk. So you can actually, I hope that you are going, you can code along with me. Please let me know by raising your hand if you are at a laptop or and if you want to code alongside with me. And if so, just go to this repository. I just made a set of my head of myself. So I'm going to copy paste this repository so you can actually take the code.
What I have I did here already is that we can open a Gitpod, which is essentially a VS code. It's an in-browser editor and we can just hit Continue here and go to the GitHub. And we will have the whole environment already set up for our workshop here. So you can just set that up. And in a moment, it's also installing everything that we are requiring here. And we are going to have the backend and the frontend side. And we are going to see in a moment how we are going to code this thing here.
What we are going to use to deploy this is a serverless platform called Genesio. Genesio allows us to easily deploy and test our application without worrying about maintaining servers or involving DevOps for deployment and scalability. With Genesio, we can code quickly and bug-free. To use Genesio, we'll need to log in with our Genesio account. Gitpod will ask for permission to open new browser tabs, and once logged in, we can open a local server to start development and testing. This workflow is convenient as Genesio Local automatically refreshes the server each time we make a change, allowing us to test the application in real-time.
Using Genesio for Development and Testing
We will log in with Genesio to code and test our application in real time. The data scraped from the C3 Fest website is too large to be used as a prompt for chat.gpt. Instead, we will store the information in a vector database. I will walk you through the code and explain the process. First, I need to create an open AI API key and generate a new one. This key will be used for Genesio Local.
And this is very convenient because you will see that we can actually code really fast and bug free with Genesio. So in order to do this, we are going to actually login with Genesio. Gitpod is going to ask us if he can open new browser, new tabs, and we can actually see that right now we are logged in with Genesio account. So I'm logged in with my company email. So right now, what I can do is open a local server in order to start and develop. And while I'm developing, I'm also going to test the application. So this is a very convenient workflow because each time I'm going to do a change, Genesio Local is going to refresh the state and the server and I can actually test it real time.
So, great. Now we have this local server started up. I will just minify this and now we can actually start looking through the code and understand a bit the template and how we are going to build this application. So first of all, I want to show you the data that I just scraped from the C3 Fest website. So here I have a list, a long file that contains the titles of the workshops and talks, the description, the speakers and the speaker description. So as you can see, this is a pretty huge file. So it would be kind of impossible for me to give this whole thing as a context into the prompt and just tell chat.gpt, just figure it out and give me a speaker and talks recommendation. So basically, the problem here is that the prompt will be too long and chat.gpt will just error, will just throw an error. So what I want to actually do is I want to store all this information into a vector database. And let's see how I actually did this. And I'm going to walk you through the code. I already wrote the code in order to, because it's much more simple to actually see the code and then see it in action. So first of all, I actually need an open AI API key. So I'm going to create it right now. .env here. I'll have an open AI API. Let me just copy paste it in order to not mess it up. And I'm going to actually generate a new API key, new two. And I am actually going to briefly reveal it. But to be honest, this is not really a problem because I have some hard limits on it. So I'm not really afraid of denial of what and after this workshop, I'm just going to delete it in order to make sure that everything is okay with it. So first of all, we are going to have open AI and open AI key Geneseo local already. So the change.
Splitting and Storing Data in the Vector Database
I'm reloading the changes and loading the open AI API key. After opening the file, I will split it into multiple documents for the vector database, which works by using chunks of embeddings. These embeddings are vectors of numbers that the models understand. Retrieving chunks with embeddings ensures we get the right information, such as sentences or paragraphs.
So change detected reloading. It already saw that I created the new .env file. And now the open AI API key is already loaded into the process itself. So what I'm actually doing here is I'm going to open the data slash talks that takes the file because I want to load it and I want to save it into the vector database. So into the special storage that I was talking about.
So first of all, what I actually want to do is I have only one document here, but it's the document itself is pretty long. So what I actually want to do is I want to split it into multiple documents because the way that a vector database work is the following. So the vector database is going to have chunks of embeddings. These embeddings are essentially vectors of numbers.
And this is the way that we are communicating with an open with any LLM, actually, not just with open AI. So models are actually understanding math. So they are understanding numbers. So we kind of have to have a function that is transforming natural language to numbers. And this function, if you want, is the embeddings function. And each time that we want to retrieve chunks with embeddings from a vector database, we actually want them to contain just the right information. So we don't really want them to be the whole document. We want maybe just sentences or just paragraphs.
Splitting and Storing Information
I'm splitting the information into chunks of maximum 1024 characters, ideally after paragraphs, new lines, or sentences. Splitting sentences or words separately doesn't make sense. I initialize the embeddings, which transform language into numbers, and create the vector database as a local file, eliminating the need for network queries. This makes the retrieval process much faster.
So what I'm actually doing here is I'm splitting all my information into paragraphs, into lines of text. And if it's possible, into after a comma or actually here I have to put a dot. So after sentences or after words. So what I actually want is chunks that are maximum 1024 characters long. And if it's possible, I want to chunk it after paragraphs or after new lines or sentences.
Because it's very different if I have the following sentence. I can take 21 vacation days and I can chunk it into two. And I can take 21 and then vacation days. And these are two separate chunks. But if I'm but they don't make any sense by themselves. So I would actually want the whole sentence into a single vector. So I'm going to split this. And then I'm just log the split documents length in order for me to understand how many chunks I have in the vector database.
Then as I told you, I am initializing the embeddings, which is the function that is transforming language into essentially numbers. And then I'm actually creating the database as a local file. So the vector database is a bit more special than a simple database. Just because I can actually embed it into the server code, let's say, or into the server files just as a local file. And this is great because I don't really have to do a query over network to get stuff. I can just read it from the query is going to be just as I read it from a local file. So this is going to be just much more faster.
Connecting, Storing, and Retrieving
I connect to the local file in the vector store directory to create a table with binary files containing the embeddings. The schema consists of vectors with embeddings, corresponding text, source, and embedded context lines. This is useful for parsing product documentation into the vector database. By outputting the reference context, we can verify if the model hallucinated or refer to the documentation. Then I save all the information in the documents, embeddings, and table.
So I'm connecting this connect basically means just opening the file that is local. So the file actually it's going to be at here locally into the vector store directory. And I'm going to create a table which essentially is going to create some binary files that will contain the embeddings themselves. And here at this line, there is another trick that is very useful for the rag systems.
So this thing, the tip here is that when we are actually chunk it, this is a schema. So this is very similar to an Postgres schema or MongoDB schema or any other schema for databases. So essentially, the vector itself is going to contain embeddings. This means just numbers. The text is going to contain the text that was embedded. So for example, for this dumb example, the text here would be string. If I am embedding a hello world, the corresponding text or content is going to be hello world. The source here is going to actually be the source that he took the text from. And in our case, it will directly be data slash talks that takes it. And here are the lines that this text, this context was embedded from.
So now the trick here is that this is especially useful when you are parsing documentation of a product into a vector database. Because then the source would be some kind of URL. The URL of the page that we are looking to. And the lines would actually be the lines where the information was found. So what we actually can do when we are coding this kind of application, we can also tell the model to just output the reference context that he learned from. So basically, if you are going to ask the model, how can I set a certain field in the YAML configuration file? It will output you the actual code snippet or example or steps. And then it can actually tell you, I learned this from a website, from and from exactly these lines of code or these lines of the documentation. So we can actually close the feedback loop and we can actually go and check if the model hallucinated or not. Or we can look more in depth into the documentation itself. This was just a bit of a trick. We are going to see more about this in a moment.
So then I'm just going to save all the information. Essentially, the documents are the source where I'm saving from. Embeddings is the function between language and numbers. And the table is the destination, the actual table in the Vector Store.
Running the Script and Modifying the Vector Store
The table in the Vector Store is the destination for saving the embeddings. The script runs before the application, installing TS6 and splitting the input file into chunks. The Vector Store is modified and contains new data from the .txt files.
And the table is the destination, the actual table in the Vector Store. So we can actually see this in action. This is basically a script that we are just running before our application.
I'm pretty sure that I have to run an npm install here. So I'm going to simply run this script as any other script. Yes, I want to install TS6. And what we actually can see is the logs that I've just pasted into the code itself. So my document or the original one has one length because I have only one file, TOX.txt. And then I'm splitting it into chunks. And I have resulting almost 140 chunks that are going to be saved as embeddings into my Vector Store.
You can actually see here that the Vector Store got modified because it's yellow now. And that means that I overwrote it. And it contains new data from the .txt files. So now I can actually go to the backend.
Free Alternatives to OpenAI and Adding Credit
Besides OpenAI, the Lama model from Meta is a free alternative that can be hosted and run on device with 8 gigabytes of RAM. To get a sense of it, try using Lama 2, an older model that fits into 8 gigabytes of RAM. Make sure to add credit to your OpenAI account to avoid exceeding your current quota.
But before, what other free alternatives do you recommend besides OpenAI? I find it hard to find a suitable one with 8 RAM machine. So what you actually can do, OpenAI is not really free right now. It's you have to pay as you go. So just don't get tricked into that. But from my experience, the Lama model from Meta is free and you can actually host it. You can actually run it on device. If you only have only 8 gigabytes of RAM, this is going to be a bit tricky for the latest version. But I think that Lama 2, which is a bit of an older model, you can fit it into 8 giga RAMs. I think it's something like 2 or 3 giga, the model itself, the whole model. So you can just try that in order to get a sense of it, at least. So you exceeded your current quota. Yes, you have to, Alexei, you actually have to add some credit to your OpenAI account. Great. Thank you so much for your questions.
Setting up the Backend and Testing Interface
We have the vector store, vector database, and a local server set up. We will uncomment the code and print something to check if everything is working correctly. We can use the testing interface to send requests to the local server.
Now, let's get back to the code itself. So what we are actually, what we talked about here is the vector store. We actually have the vector database and vector store in place. So now we want to kind of have a back end that we can use in order to chat with our users. OpenAI. So what we are going to do here is I'm going to uncomment code as long as we are understanding it. Don't forget that I have here the local server that we can just test it. So what I'm actually going to do here is, just to start out, I'm going to print something to see that everything is working correctly. So I'm saving it. I can already see that Genisio picked up the modification. And what I can actually do here is that I can open this testing interface, which essentially is like a post-monthly where I can just send requests to my local server there. And my description is full-stack engineer passionate about generative AI. And I'm going to just hit send. At the moment, my application is doing nothing.
Connecting to the Database and Retrieving Context
We connect to the database and create a vector store object. The vector store allows us to retrieve relevant context using a similarity search based on keywords. We can log the context and slice to see the extracted data. By increasing the number of chunks, we get more data related to generative AI and filter out irrelevant information from the proprietary data.
So let's go back here. What is... The first thing that I want to do is actually connect and open the database. So I want to connect to it. Remember that this is just reading a local file and I want to open the table vectors.
Then I'm going to... I'm actually going to create the vector store object. And this is going to be important because this vector store is going to help me retrieve the important context here. And now let's just see exactly what do I mean by this.
So this vector store is capable of doing a similarity search. That means that I can just paste some keywords here. So for example, I can put in generative AI. And then I can tell him the number of chunks of relevant data that I want extracted from the database itself. So just for the purpose of testing it out, I'm going to put here one. So this is going to give me the first sentence and the most important sentence related to generative AI, these keywords especially.
What I'm going to do here is I'm going to log the context and log the slice. So let's... This is going to log the first 100 characters of the slice. And now that I did this, let's just send another request here in order to see the console log here, the logs. So starting, I'm retrieving something from data talks that takes it. This is the source. The lines is from the 200-something lines. And the context itself is Goran, whatever, is the CTO and co-founder of Synthetic AI, a startup company. So basically it picked the AI concept and it just matched the first relevant sentence of it.
So now if I'm just increasing these numbers, I'm going to get more data that is related to generative AI. So let's try with five. Let's... Great. And now we can take a look and we see that we have more sentences about generative AI, but we have no other sentences about other talks or other speakers that are not related to generative AI. So we are kind of, you can already see how we are actually filtering out information from the proprietary data that we don't actually need.
Creating a Prompt and Defining the Model
We want a retriever to help us retrieve data from the vector store. We create a prompt for OpenAI to give us the top three speakers based on our description and interests. We can instruct the model to respond as a JSON object, allowing us to embed OpenAI answers into programmatic use cases. We define the model as GPT-40 and adjust the temperature to ensure accurate responses.
So this was only for debugging reasons. You don't really have to do this in the application itself. I just wanted for you to get a great grasp of what a vector database can do.
Now we actually want some kind of a retriever. This is going to be the object that is going to help us retrieve stuff from the vector store. So this K is a context document numbers. This is basically the numbers of chunks that I want. The search type is similarity, and I'm going to use it verbose through in order to see everything and to be able to debug and understand the concepts.
So right now, what we can do next is to actually create the prompt and the whole chain with it. So I'm going to, let me just, great. Let me just uncomment everything. And let's start with the prompt because I guess it makes a lot of sense. So the prompt here is what I'm actually going to give to OpenAI in the first place. So I am instructing them that its task is to give me the top three speakers that I should see on the conference based on my description and on my interests. Another interesting piece of, here of information is the following.
I can actually instruct the model itself to respond as a JSON object. And I'm going to give him a structure. The structure is a list of speakers with the field speaker and the reason why he actually chose that for me. And this is actually great because right now we can actually embed OpenAI answers into programmatic use cases. So I am 100% sure that it's going to give me a JSON object so I can parse it and I can embed it into my UI or into other microservices and so on. So this is going to do for me. The JSON object.
And here, another piece of information that is very important. I'm going to give him the context. So below this context, this is going to be the prompt when I'm going to construct the prompt, this context is going to be replaced with the proprietary data that I just got from the vector database. And then I'm going to add the human message which is basically just my description. This question mark is going to be replaced by my interests. So what is happening next is I'm going to define the model itself which is GPT-40 because it's the latest and it's the most accurate and fast. And what I'm actually going to do is I'm going to tweak this temperature because I want the model to be as accurate as possible and not hallucinate or give me surprising messages. So if the temperature is very low, if I'm asking chat-GPT the same question over and over again, I will kind of get the same answer.
Creating the Chain and Sanitizing the Response
I want factual question and responses, so I will use a low temperature. The output will be a simple string. I will create a chain to retrieve context, input it into the prompt, give it to the model, and get the output as a string. I sanitize the response by removing triple backticks and parse it to ensure the expected type. I define the types for the recommendation list with speaker and reason, and log and return it for testing.
It won't get creative at all. And this is what I actually want with factual question and responses. So if I have facts, I want a low temperature. If I want creativity about writing or about stuff like this, I would want a high temperature.
This is very straightforward. I also want an output. The output is going to be just a simple string. And then I can actually do the meat and bones of this application which is creating the chain that is going to do the following. It's going to retrieve context and is going to input it into the prompt as context. It's going to give it to the model and then it's going to give the output to me as a string.
Essentially what I'm doing here is a bit of sanitization and cleaning. So I'm just going to replace these triple backticks with nothing because I really don't want them into my output because I want them to be able to parse the response and to have an actual type and to be sure that this is the type that I'm waiting for. So if we are looking into this recommendation list, we can see here up top that I have defined my types, which is a recommendation, which has speaker and reason as I requested. And it will be a list of such types. And now I can just log it and I will also return it in order to see something relevant when we are going to test it.
Response Sanitization and Speaker Recommendation
I replace the triple backticks in the response to ensure the correct type. The recommendation list contains speaker and reason. I log and return it for testing. During live workshops, small errors can occur. After reopening, chat-GPT responds with the answer. The JSON contains real speakers. Chat-GPT provided a relevant talk on generative AI. Asking about open source returns different speakers.
So I'm just going to replace these triple backticks with nothing because I really don't want them into my output because I want them to be able to parse the response and to have an actual type and to be sure that this is the type that I'm waiting for. So if we are looking into this recommendation list, we can see here up top that I have defined my types, which is a recommendation, which has speaker and reason as I requested. And it will be a list of such types. And now I can just log it and I will also return it in order to see something relevant when we are going to test it.
So this is what happens with live workshops and demo. I tested this before the demo, like three or four times, everything went smoothly and correctly from the first time, and now because it's live, it's also giving me a small error. So let me just reopen this. Now it obviously does something. We are waiting a bit because chat-GPT takes a bit of time before responding because it has to parse the data. And now we can just see the answer itself.
As promised, this is a JSON, a list of speaker and reason, and we can actually check that if chat-GPT hallucinated here and these people never exist. So we are going back to the C3 festival here, and just let's look for this. And we can see it's an actual person and it's bringing your own Star Wars. Let me just check if it has something about AI here. Yes. So the droid can perform AI, learn about Azure AI. So basically, chat-GPT actually gave me a relevant talk that I should attend because I am interested into generative AI. We can also put here a question about open source, and we can hit again. And I expect us to give us new speakers that are talking about open source and stuff. So as you can see, we are already begin being given different speakers. Let's take this name and check it out to see if it actually exists. Let me. Yes. And probably if we are looking into the description itself, we are going to see something about open source or maybe just source.
Creating an End-to-End Application with Genesio
We are given new speakers on open source topics. We need to check their existence and descriptions. Our job is not done yet. We need to create an end-to-end application. Genesio generates an SDK to import the back-end service as a dependency in the front-end. The IDE autocompletes and provides type-safe communication.
And I expect us to give us new speakers that are talking about open source and stuff. So as you can see, we are already begin being given different speakers. Let's take this name and check it out to see if it actually exists. Let me. Yes. And probably if we are looking into the description itself, we are going to see something about open source or maybe just source.
Nothing here. Well, I guess it probably picked up on the fact that he works with GitHub, and that means somehow he is dealing with open source stuff. Okay, but our job is not ended yet. What we actually have to do, as I promised at the beginning of the workshop, is that we will have an end-to-end application, and this is just the back-end side. We have the testing interface here, but we are just hitting requests without a UI. So let's take a look into the UI itself. What is pretty great here about Genesio is that if we take a look here, unfortunately, again, I have to install the dependencies. Just a moment. npm install here, in order to install everything. Just a moment.
If you have any other questions or curiosities, I am happy to respond to them. Just paste them in the chat. Okay, I know what is happening. I'm going to start the local environment once again. I think that this is because somehow it cannot find React. We will see in a moment why this is happening, and we will see if we cannot deploy without this. But still, what I wanted to actually show to you is that Genesio is generating an SDK from your back-end service. So basically, what I actually can do, I can import the back-end service as if it is a dependency in this front-end. So essentially, what I can do is I can simply call methods here as they would be part of my back-end, as they would be local. But in reality, they are not really local. They are deployed somewhere, and I can just call them very easily. And if you just take a look here, we can actually see that the IDE itself is going to autocomplete everything for us. And if I hover, we see that ask is expending one argument, and this argument is of type user description. So actually, Genesio is helping us to have end-to-end type safe communication.
Deploying the Application and Summary
I can save myself from a lot of type safe bugs by using the user description type object. Deploying the application involves installing dependencies, uploading the back-end service, creating and publishing the SDK, and building the front-end. We can import the SDK in any front-end and access the back-end service. The application is a personalized speaker recommendation that retrieves relevant information from a vector database, appends it to a prompt, and uses an OpenAI model to provide speakers and a summary of the talks.
So basically, I'm not allowed to put here anything else beside the user description type object. So I can put here user description, and we can already see that this is underlined with red. So this is not assignable to a parameter of type user description. So with this feature in place, I can actually save me from a lot of type safe bugs. So essentially, this is going to save me from some headaches.
This is how simple you can actually call the back-end service that we just wrote. And let's now actually see, even if this application is so red, and I don't really get why, if you can just actually deploy it. So we are going to hit Genesio deploy. This is going to install the dependencies from both the back-end and the front-end. And we're going to just wait here a bit. Once it's bundling our code, just bear with me, because great. It uploaded the back-end service. So it bundled the code. It uploaded it to the cloud provider. It created and published the SDK into an NPM registry. So right now, we can actually import it in any front-end at all. And now it's building the front-end, and we are being given two URLs here.
The first URL, if we are going to navigate to it, we can see that this is a dashboard thing that we can just see the components of our project. So we can just see the back-end service was deployed a few seconds ago. And we can actually see that the OpenAI API key is set, and it's secretive. And we can take a look at the domains. And these domains are basically, this is the end-to-end application. So right now, this is the React application that I have coded, and we can actually tell him I am a full-stack engineer interested in back-end and cloud-native stuff. I want to just try a lot of prompts in order to see what is going to be given to me. So I'm going to send, and right now, what I'm actually expecting is the three speakers.
So again, Bassem just showed up later, before, and I have here the end-to-end application. I didn't went too deep into the React itself, because it's basically just divs, buttons, and an input form that is just here. So I wanted you to show the end result. So basically, just to summarize, what we have here is a personalized speaker recommendation that is going to retrieve the relevant information about our interests from a vector database, is going to append them to a prompt, and alongside with our question of give me the three speakers, the top three speakers and talks, is going to give them to an OpenAI model, and the model is going to give us back the speakers, and is going to give us a summary of why we should actually attend the talks. So all in all, this is the application.
Wrapping Up and Invitation to Stay in Touch
You can use the application as a template and reach out for any issues or questions about generative AI and serverless. Please provide feedback to improve the workshop. Thank you for attending.
It was pretty simple to build, and it's an open source. You can actually use this as a template. If you are finding issues with it, you can just create a GitHub issue or just contact me.
Next, I would like to end this workshop with an invitation to stay in touch. You can contact me any time with any kind of questions about generative AI or about serverless, two topics that I'm very passionate about. If you have two minutes, I would be very thankful if you would just give feedback for this workshop to make it better next time or to tell me what you liked or what you wanted to explain more.
Lastly, I am also keen to know if you have any questions at the end of the workshop. If you don't have any other questions, I hope you have a very nice day. I really hope that you are going to write your own LLM application and wrappers and let's embed them into our work to make it easier and more productive. Thank you very much for coming.
Comments