That's why, you know, this is, I think this picture kind of fits here. You're, you're passing in, you have an original thing here, that was in your vector database, and you're passing, you're looking for, you know, you're doing a similarity search in like a query style operation, it's going to find that match and know that that, that that vector is the one that we're, that we're interested in, in using. And it'll probably find a couple other similar ones too. If you're, if you're, if your context size is, is large.
So, I had to, I had to include this one. You have to, you have to say it right, right? The circle of lay-i's. The kind of the constant operation and flow of information into these different components is, is how basically, active training is one way, or even just like retraining of old models works. We have our original model, or maybe we, maybe we're creating it for the first time. We're just kind of preceding it with original data, but we have a model where we're, we're using the like new incoming data to continually augment that model. We look at the new data coming in and it's a step called, I think it's called attention, where you say, is this, is this, is this new information worth keeping or, or, or, or using as, as future training, training information. And you can use, kind of use the cycle of like model to input, to store data, to, you know, relevant, relevancy, relevancy slash, you know, whether it's worth keeping, which can be a manual step, but it becomes this circle of, I think you can, you can use that information to then kind of further improve your model. So, you can, you can basically kind of refine over time, and feed new information into this, into this process through just a normal kind of operation of, of these different pieces of the puzzle here. And again, that's kind of what's called active learning.
And now, this is, this is something that we're, you know, we're going to be building a RAG application here in just a little while. But maybe it's worth hitting on what, what RAG actually is. So, RAG stands for Retrieval Augmented Generation. And it looks a lot like what we just talked about in that circle of life, life thing. We, so we have, you know, if we're using like a base model to do our initial, you know, embedding generation and stuff like that, that works, you know, until, until we have some new information that like, there's usually, you may have, you may have tried to use chat GTP, and it says, Oh, hey, sorry, my cutoff date was, you know, I don't know what it is these days, but we'll say like, July 17th, 2023. So, I don't have any new information since then. Sometimes, what information is currently in your model is not enough, and you want to actually give it more context. So, that it has information that it didn't previously have. This is the, this is the real selling point of Retrieval Augmented Generation. And it's real, you know, it's two, two big points. Text is, you know, text generation, that's not unique to RAGs necessarily, that happens with any model. But anything that does information retrieval, is really where RAGs kind of come to, come into, come into play. And sometimes this intrival information is called context. So, when we think of like, where we could use something that also has like contextual information relevant to stuff that we want, it often gets used for like, question, or prompt-based answering, recommendation engines, you know, it can be tailored to like a specific store, or you know, I'm sure Amazon uses a lot of this stuff for their type of setup. Document summarization, if you're actually using the RAG to fetch information from documents, can then use those as context sources for summarizing. And again, really anything that needs access to local information that may not already be present inside of the original model. So, I know we're getting into the weeds here a little bit, but I promised I was going to show you kind of what embeddings and stuff look like. Here is what, you know, let's say we want to add some information to an existing model that we're using. And it doesn't know, you know, what people typically buy each other on Valentine's Day. So, what I've done here is, you know, I feed it an input and I say traditional gifts on Valentine's Day include, and, you know, go on and on about flowers and, you know, stuffed animals, whatever else you like. And that we would feed that input into our LLM, our model. Here I gave, like, this is an example one, it doesn't have to be this one, but you know, it's what we have here. Now, that LLM, this tokenizer lives inside the LLM. But there is a step in the process called tokenization, where we're breaking down the input into, that's not the way to say it, tokens, which are this kind of nebulous concept in AI and ML. But essentially, it's a way, it's trying to, like, segment your input data into smaller pieces that it can then use to generate the embeddings. So, it depends a little bit on, like, in the model that you use as to how this tokenizer works, but it's trying to find what pieces of information here can be broken down to an embedding that can later be stored. So, in this case, this is, I went to open AI and type this in, and they've got, like, an online tokenizer. This would be the token count for what this looks like. And I think the general guidance they give is that, like, roughly four characters is supposed to be a token, but we can see here it's pretty all over the place. So, again, given these is, you don't have to worry about this too much. This all happens a little bit behind the scenes. But I thought it was worth it, worth mentioning, at least just since we are kind of getting into the weeds here a little bit. Out of those tokens that get generated, then we pass it into our embedding generator, and this is what actually converts the tokens into numeric floats that represent the vectors. And, again, we're saying vectors are represent the relativeness of each particular point inside of this dimensionality set that we have here, which is, again, trained against all of the parameters and features that we put together from the model itself. These floats typically are between 1 and 0, although they don't have to be. It helps a little bit for some of the distance metrics if they are between 1 and 0. So, sometimes you may want to normalize, like, a lot of models will actually normalize them for you. Sometimes you may have to do that yourself if you care about the distance metric that does need it to be between 1 and 0, but, again, you're not going to have to worry about that part too much. It's just, this all kind of happens for you. And these embeddings are stored in the vector database, which we just talked about, and eventually, this is like a context store, and each of the rows in here have this float array of n dimensions. So, this, again, the whole process that we're working through here is we want to take this information and store it in the vector database so that we can use it to then do similarity search on later. So, this is the process of adding information to our store. Querying uses a fairly similar type of setup here, where we query, again, we generate our embedding, and then that embedding goes to the vector database and finds the top k relevant, depending on what we've set for that value, based on our dimensionality space here. And we use that, again, as context into the LLM that does the augmentation of, basically, it kind of combines the query information that comes in with the context that's come out of the store here, and then we can use that to then do the actual generation inside of our LLM and get the response out of that. Questions, comments? I know there's a lot going on here. This can be a little bit tough to follow, so hopefully these visuals are helping a little bit. But, again, I know a lot of these concepts may not be, you know, you may not be as familiar with, so, yeah, you're, you know, don't be shy if it's a syllable confusing. And we can always circle back later. Cool. And we are going to do a break here in just a little bit, so don't worry here.
Comments