So far we've been talking about just like a base model that will kind of dream up base internet documents. Now let's actually go into how we obtain an assistant. This is something you're probably more familiar with.
So as I said so far, we've only talked about these internet document generators, which is a result of the first stage of training, which is what we call pre-training. And now we're moving into the second stage of training, which is what we call fine tuning.
This is where we obtain what we call an assistant model, because we don't actually really just want document generators, as that's not super helpful for many tasks. We want to give questions to the model, and we want it to generate answers based on the questions that we really want an assistant, based on questions. So we really want an assistant model instead.
The way you obtain these assistant models is fundamentally through a following process. We keep the optimization identical, so the training will be the same. It's just a next word prediction task, but we're going to swap out the dataset on how we're training it. So it used to be that we're training on a huge amount of internet documents. We're going to swap this out for a training set that we collect manually by using a lot of people.
So typically a company will hire people and they'll give them labeling instructions, and they'll have all these people come up with questions and then write answers to them. So here's an example that might make it into your training set. There's a user and it says something like, can you write a short introduction about the relevance of this term in economics, and so on. And then the assistant, and again, the person, fills in what the ideal response would be to this question. The ideal response and how that is specified and what it should look like all just comes from labeling documentation. Labeling instructions basically that are provided by the maintainers of the model, such as companies like OpenAI.
Now the pre-training stage is about a large quantity of text and potentially low quality because it just comes from the internet and there's tens or hundreds of terabytes of data. And it's just all very high quality. You can't ensure that. The second stage, however, though, we prefer quality over quantity. So we may have many fewer documents, for example, only a hundred thousand, but all these documents are now conversations and they should be very high quality conversations. Fundamentally, people create these conversations based on labeling instructions.
So we swap out the data set and now we train on these Q&A documents, which is a process we call fine tuning the model. So once you fine tune the model, you obtain what we call an assistant model. This assistant model now subscribes to the form of its new training documents. For example, if you give it a question, can you help me with this code? It seems like there's a bug in print hello world. So even though this question was not in the training set, the document, the fine tuning training set, the model is understanding that it should answer this question in a helpful manner. And it will answer by sampling word by word from left to right, top to bottom. All the words that are in response to this query, it's kind of remarkable. And it's also kind of empirical and not fully understood that these models are able to sort of like change their behavior, their formatting to now be helpful assistance because they've seen so many documents of it during the fine tuning stage. But they're still somehow able to access the knowledge from the pre-training stage.
So roughly speaking, pre-training stage is about training on a ton of intranet. It's about knowledge and the fine training stage, the fine tuning stage is about what we call alignment. It's about sort of like changing the format from intranet document generator to a question answered document assistant model. So the question here is like, does fine tuning prevent dreams and hallucinations? Answer here is like no. Fine tuning is kind of directs these dreams into a helpful assistant dreams instead of internet document dreams. So word of caution here, you need to always be careful with what LLMs tell you and always double check the results. And if you need up to date data, probably provide that as relevant context to the model.
So there's actually a next step in fine tuning called comparisons, comparison labels. The reason that we can do this is in many cases, it's much easier to compare candidate answers or model answers than write the answer yourself if you're a human labeler. So we can think of this example, like with creative tasks like running IQ. Suppose the question is to write a IQ about grafana or something like this. From the perspective of a labeler, if I'm asked to write an IQ, that's kind of a difficult task, at least for me. I might be able to write an IQ and yeah, it takes a lot of time. But suppose we ask the model to come up with a few candidate IQs from the stage two of the model. Well, then as a human labeler, I look at these IQs and then kind of pick one that is the best. And this is a lot easier to do comparison than generating from scratch.
Comments