Okay, perfect. I think that we'll stick to the space and we'll continue like this. I'll wait for the others to try to split their documents into smaller chunks and see if it works for them or not. Once you're done, please let me know in the comments.
Okay, we'll wait for everybody to be done. I'm okay with the pace, what should it look like? So basically, when you load all your documents, camel the chunks, you can do like this. You can just load three chunks and the chunks should look something like this. You should see a document object with page content and some metadata. And you should see like this page content. You should basically see three chunks when you're done. And it should have some content inside it. Okay, while you guys are trying, I'll just wait for two more minutes. Stop sharing. Oh, Alex has said working for me for both PDF and CSV, super. I think that's great.
So now you can, I don't know, if you have your own CSV files with financial data or your budgeting data, you can just upload it on your own LinkedIn and ask it to, I don't know, ask it to play around with it, ask some questions that you want. Okay, we'll move forward now. I think working for me, working for PDF, everybody, I think it's working for them. Next step, we will go to now, going back to the diagram here. Next step we are going to do is, after chunking, we are going to create embeddings. Now, embeddings is sometimes a complicated topic for everybody, like what does embedding usually mean? So I would want to break it down into simpler terms that whenever you have structured, how do I say it? Okay, so embeddings are basically, you want to represent your data in some way, like when I am, when usually we are dealing with data structures, if I have to represent a class person, how do I represent it? I basically create a blueprint class person. I say it has name, age, gender, height, weight, or whatever, I give it these characteristics, and then I save it in a database called as a MySQL database, which is usually a relational database, if my relations, if my structure is more particular, if I know the structures. If it does not have any structure, I usually use a NoSQL database, like MongoDB or something else, for example. But in the AI world, in the LLM world, this data is not structured data, right? It's unstructured data. It is natural language data. Anybody's asking anything, any questions is coming.
The PDF file is broken into smaller chunks. It's a huge file. So what you do is every chunk that is created or every context or every question that comes, you want to represent it via some identifier. That identifier usually is an embedding, and in very simple words, if you know something about vectors or anything about the XY coordinate, an embedding, in very simple terms, is basically XYZ coordinates. So if you have a plane with X, Y, and Z coordinates, every chunk of my PDF occupies a space in that coordinate. So if I want to find PDF number two, I will say that, give me a zero comma one comma two coordinate, and that PDF will come out. If you have seen Google Maps geolocation, right, it uses latitude and longitude. That are also not exactly embedding, but that are also coordinates in a way to find a particular place. But those are just XY coordinate line, latitude and longitude. Imagine you have a third coordinate also. You have a third dimension, which is the Z dimension, XYZ. And in AI terms, these dimensions are huge. It could be 2000 dimensions. It could be 3000 dimensions. So embeddings are basically X, Y, Z, Z1, Z2, Z3, Z4, Y1, Y2, so those many coordinates, but they identify a particular chunk or a particular data in the vector space. So this is what we are going to do. Out of all these chunks, we are going to create those XYZ coordinates. And I'll show you how that looks like.
Going back to the documentation, we are going to use embeddings of OpenAI. So we are going to go here, we are going to say text embedding models, and it says we have to import this OpenAI embeddings. So that's exactly what we are going to do. Let's go back to our repo. I've already imported Open and we have to go to this folder now, create embeddings 1.js. Everybody with me so far, create embeddings 1.js. We do OpenAI embeddings.envconfig remains the same. What we are going to do is, whatever documents we created, we are going to create an embedding of that. But first I will show you how embedding looks like, okay? So first I'm going to initialize this embedding. I'm going to say embedding is equal to new OpenAI embedding. I'm going to have an input here, which I will say, let's say what is capital of France, again, the same question. Let me make this capital. And then what I'm going to do is, I'm going to see the result of this embedding, how it looks like. I'm going to say, embedding.embedQuery. And I'm going to embed this query that I have here. And then I'm going to see what is this entire sentence will look like in a vector space of x, y, z, y1 coordinates. So then I'm going to say console.log result.
Comments