Hey Thomas, how is it going for you?
Hi. Good, good. How is it going for you?
Yeah, it's also pretty nice. I mean, the sun is already a bit out in Munich, but it's still quite a bunch of exciting talks. And you did kick off this conference pretty nicely because I do believe that natural language processing or language in general is one of those indicators how good we are at understanding this machine learning.
How do you feel?
Yeah, definitely. I think what we've seen in NLP right now I guess over the last two years probably is that it has become really what we would have expected to become from the start, which means really the way to process knowledge and the way to kind of do researching or what we hope would be like researching. And so when we talk about AGI, I think right now a lot of people think about GPT-3, which is a full text model. So I think this is really an impressive thing about how NLP is now the most exciting field to be in AI.
Yeah, and it's kind of funny because you're almost predicting the first question right. And the first question is actually about GPT-3 and people asking like, hey, since you're an expert in NLP and your company is driving such good efforts in this regard, what do you think is the advance of GPT-3 in comparison to GPT-2?
Yeah, it's a good question. I think, well, one of the problems of GPT-3 is that it's quite difficult to access it. So I think we have not really evaluated the capability of GPT-3. Like, we've done it from other models like BERT or GPT-2 just because a lot of academics did not really have full access to be able to investigate what's happening. What can you do with that? Be able to test it fully on a lot of tasks. So it's kind of hard to give you really an answer, right? What I think is what we can do is that GPT-3 behave in some way like an interesting thing, which is a retrieval, like a smooth retrieval ranging over a really large data set. So you can do like, it's like having a huge Google Search, where you can search every page on the internet and being able to smoothly interpolate between all these pages. So I think this is very interesting and what we see, that you can do some pretty cool application with that, like you can smoothly interpolate to generate like also code and to generate like a realistic looking blog post. Now, when you talk about like really reasoning, meaning and things like that, I don't think there is really any deep breaks in GPT suite, but that's my personal opinion.
Yeah, no it's really good that at least for me it does resonate that you separate reasoning from having a big database. Because sometimes we have feelings that our community or parts of our community going like, hey, if database is bigger, you can just solve all of the problems. And sometimes it's not really, having a bigger model doesn't really mean that you're suddenly having like an AGI, and it's good to remember basically.
Yeah. So another question would be, just from the audience, there are so many different sizes of model, even transformer-based, there's GPT-3, there's transformers, there's Vertra, Roberta, all kind of things. There's also more distilled right? And when one machine learning engineer starts to work on the task, what is a good rule of thumb how to make this decision process? And obviously there is no one clear answer, but do you have any mental model or a framework, especially for beginners, who may be working in the company that doesn't have a big machine learning group, but they're person to actually make the calls, how can we help and support such a person?
Yeah, that's a good question. I think that's definitely something a lot of people face. I mean, the very practical thing about me, about our team at SwiggyFace, I think is that we should help people do that, because I understand we have like, we're providing a lot of models, we're providing a lot of checkpoints, but it's really hard to actually see the one you should select, the one you should use. So the first thing is that we will try to build some better tools on this. But now, for the quick answer, I think it's good to keep your good reflex, your good routine, is that you should start with something simple, like you should start with a smaller model, like starting with distilled birth, for instance, instead of birth, like starting with something small, and see how far you can go with that, with this compute-efficient model, like a distilled birth, a distilled GPT-2, distilled Roberta, you test a little bit with them. And if it's not enough, then you scale up, then you start to use bigger models and you try to see if you need, like, a T5 or something like that.
Comments