So I did already run NPX create next app using the next lane chain example. I installed MongoDB, the lane chain MongoDB integration, React markdown for some styling and .env because we're going to use a node script in order to run our ingest.
So I also have an open AI API key and my MongoDB Atlas connection string in my environment variables. So let's go ahead and check out this app. So this is the example straight, without any alteration. Well, I added dark mode so I wouldn't blind everybody, so you're welcome for that. So let's just test to make sure it works, so let's say what is MongoDB and hopefully the Wi-Fi works, and okay, there we go. And open AI responds to us with a pretty good answer. So it's working out of the box. Great.
Let's check out the code. So I've got this fake documents directory here, and I use chat GPT to help me create some fake documentation for a fake JavaScript library called fancy widget.js. So we have read me, usage, license, installation, contributing, change log, API reference. We have all the documentation that you'd expect from a JavaScript library. So what we're going to do is we're going to take these markdown files and we're going to transform them into vectors, vector embeddings, and then we're going to save those in our vector database. We're going to use MongoDB for the vector database. And then we can use, during vector search, we can use this to augment the LLMs capabilities so they can answer questions based on this information.
All right, so let's go ahead and get started doing that. So in the root here, I'm going to create a new file. We're going to name it create embeddings.mjs, and then we are going to do some typing here. So we're going to import our file system promises from recursive character text splitter, and then import open AI embeddings from lang chain open AI, and then our Mongo client from MongoDB, and then our MongoDB Atlas vector search from lang chain, and then we'll set up our Mongo clients, getting our environment variable there for our connection string.
Our database name is going to be documents, collection name embeddings. We'll set up our collection, and then we'll get our documents directory, those fake documents, and then get the files for those, and then console log the file names, and then the file name, look through those file names, get each document. After we read each document, then we're going to console log that we're vectorizing the document, and then our splitter is going to use our recursive character text splitter from lang chain, and we'll chunk those into different pieces, and then output those and store those into MongoDB using MongoDB Atlas vector search. We'll create those embeddings, we'll tell it which collection, which index name, which text key, and which embedding key to use, and then console log that we're done, close the connection to MongoDB.
And there is a bit of a typo here. Of course that didn't happen in practice. And of course I wasn't typing because that was a VS code extension. This is supposed to be import recursive character text splitter. So let me grab that.
Comments