We'll build an AI-powered documentation site that answers questions, provides contextually relevant information, and offers links for further exploration. To get started, we create embeddings for our custom data written in markdown files. We use the Lang chain text splitter, MongoDB Atlas vector store, open AI embeddings, MongoDB client, and .env. After preparing the MongoDB connection and processing the documentation files, we store the embeddings in our MongoDB collection. We set up a search index on the collection using a JSON configuration. Finally, we set up the Next.js app, utilizing the Vercel AI SDK, LingChain, and OpenAI chat models.
So let's take a look at how to build a REACT application with these technologies. And this is what we're going to build, an AI-powered documentation site. Now, this site will not only answer questions but also provide contextually relevant information, summarize answers, and provide links to relevant pages to dig deeper.
Now, the first thing that we need to do is create embeddings for our custom data. Now, since this chatbot is going to reference our custom documentation, we'll assume it's written in markdown files. And so this embedding function is just a node app, a single file with less than 60 lines of code, and we're going to build our Next.js app. We have the Lang chain text splitter, the MongoDB Atlas vector store, open AI embeddings, MongoDB client, and then .env.
Below that, we'll prepare our MongoDB connection using our connection string and get the collection that we want to use. And then we need to process our documentation files. We have to split them into chunks that the encoder can use. So we'll create a splitter using the recursive character text splitter from lang chain to split the Markdown files that we're fetching. And then we'll create our output, awaiting the splitter, create documents function, passing it our documents. And lastly, we'll use lang chain to store these embeddings in our MongoDB collection. We're passing to MongoDB Atlas vector search the output, creating a new AI embeddings, and then giving it the metadata for the collection, index name, text key, and embedding key. These keys are the fields that contain the original text and the embedding vectors.
And this is an example of what ends up in our database. We have our original text and Markdown format, our vector embeddings and metadata that identifies the portion of the document that this text came from. And one last thing that we have to do to prepare the data for searching is to set up a search index on our collection in MongoDB. You can do that through the Atlas UI using this JSON configuration. So here we're specifying the dimensions of the encoder model that we're using. In this case, it's 1536. And then you can also define the similarity and type to use.
Okay, so now the fun part, let's set up the Next.js app. If you'd like, you can use the Next.js LingChain starter, which already has everything set up except for the MongoDB driver, so you'll also need to npm install MongoDB. Next, add your OpenAI API key to the environment variable file. Now this is the default chat route provided with the Next.js LingChain template. We can see that it utilizes the Vercel AI SDK along with LingChain, OpenAI chat models and some LingChain schema helpers. Now, further down this route handler, we are creating an OpenAI chat, and here we can specify which model we'd like to use. We're also setting streaming to true because we want the response to start populating in the UI as fast as possible and stream to the user, and then it returns the streaming response. And so in this route is where we need to inject our own custom data.
Comments