Video Summary and Transcription
GitHub Copilot is a software development productivity tool that suggests whole blocks of code based on the collective knowledge of software developers. It has been in technical preview for a year and is used by thousands of developers. Copilot's success has grown over time, and it now supports multiple editors and programming languages. The AI model used in Copilot, called Codex, operates on natural language and doesn't require special encoding. Copilot will become a paid product in the summer but will remain free for students and verified open source contributors.
1. Introduction to GitHub Copilot
GitHub Copilot is a software development productivity tool that suggests whole blocks of code based on the collective knowledge of software developers. It uses an AI model trained on billions of lines of publicly available code. The suggestions adapt to the coding style and context, providing a unique user experience. GitHub Copilot has been in technical preview for a year, with thousands of developers using it. 35% of newly written code in files with Copilot enabled is suggested by Copilot, making the development process more productive.
Thank you for listening. Hello, everyone. First of all, I'm really, really glad that you are all here. It's been the last two years were difficult, of course, so thank you all for being here.
Yeah, my name is Krzysztof, and I have some fancy title at GitHub. It doesn't really matter. I've been working for the last year and a half on a project called GitHub Copilot. Hopefully, many of you have heard about GitHub Copilot. It has been a hyped topic on the Internet and the social media, but if you haven't, let me quickly introduce you to GitHub Copilot.
So what it is? Basically, it's a software development productivity tool. You can think about it as a bit more powerful autocomplete. However, unlike traditional autocomplete systems, it's not powered by the semantic information or static code analysis, or anything like that. Instead, it's using codex AI model developed by OpenAI that has been trained on collective knowledge of us all, software developers, in the form of billions of lines of code publicly available on the internet. Thanks to that, GitHub Copilot is not limited to suggesting only single words, like variable names or function names, like in the case of traditional autocomplete systems. Instead, we can try to suggest whole blocks of code, multiline blocks of code, that tries to adapt to the current context, tries to figure out what is your next step, what you're planning to do. Those snippets, those suggestions adapt to the coding style, to using other files, other functions from your project, and much more. And because of that, we had to come up with a bit different user experience for those suggestions. using traditional auto complete widget, which is a list of functions, instead we designed this user experience of having this inline, virtual text that's displayed directly in the editor, and I will talk a bit more about that in a moment.
However if you haven't seen GitHub Copilot yet, I do have a quick video for you to demo it, and I'll be talking while this video is playing. As you can see this grey text thingy, those are suggestions that are suggested by GitHub Copilot. They are updated as you type in your editor, so you don't need to make any additional action to get those suggestions, you just type in your editor code as you would normally code, and we just try to suggest something useful for you that's hopefully helpful. In this particular demo, there are only single-line completions, but we can also suggest multi-line completions in some cases. So, you probably want to ask me, yes, this looks cool at the demo, but is it actually useful and does it actually produce any value to the users, improves your productivity? So, as it was mentioned in the introduction, Copilot has been in technical preview for almost a year. We've released it in late June last year, so it's literally one year right now, and it has been used by thousands and thousands of software developers around the world. I cannot, of course, share any specific numbers on that, however, there is one number that I can share, and this is this number. So, for the users that have Copilot enabled on their file, we see that 35 per cent of the newly written code in this file has been suggested by the GitHub Copilot. I mean, of course, software development is not just typing the code, but imagine if you type code by a couple of hours every day, if that process is 35 per cent more productive, that means that you have maybe two hours more of your time back. That's really amazing. This number has been growing steadily for last year. We haven't started with this number.
2. Technical Details and Success of Copilot
We started with a much lower number and it has been growing. Our estimations are high in the next couple of years. When I joined the team, we had a simple architecture using a Visual Studio code extension and communicating directly with AI codecs hosted by Open AI. Copilot has become a successful product with a lot of hype on the internet.
We started with a much lower number. It has been growing given the improvements that we've done to the project for over last year. Also, we are sure that we can bring this number way higher. Our estimations are high in the next couple of years.
So, that was marketing. And now let's go into technical details because I'm a software developer. I'm not really trying to sell you a product. I'm here to talk about what we do. And also, as a note, I'm not a data scientist. I'm not an AI guy. I'm working on this project because I have been a developer tooling expert. I have a lot of experience with Visual Studio code extensions and other developer tools. So, that's my role in the team. So, I'm not going to go too deep into AI or how it works. I just don't really understand that. And that's fine. I imagine that most of us are not AI experts or data scientists. We are just software engineers that try to build useful stuff for our customers.
So, when I joined the team and when Copilot started as a project, we were using really simple architecture. We had Visual Studio code extension written in TypeScript, and then we were just communicating directly with AI codecs hosted on by Open AI. You can do that in your projects right now. You need to access to Open AI for access to the preview or beta program, whatever it's called. And then just send HTTP request to it, and it will respond. Actually, this is the architecture that we started our technical preview with around a year ago. There were some additional calls to the github.com for authentication, but that's boring stuff. In principle, it was as simple as that.
And then, Copilot has become a fairly successful product. There has been a lot of hype on the internet about the product, and people, most common question that people were asking, hey, can I use Copilot in IntelliJ? Can I use Copilot in NeoVim? Can I use Copilot in Visual Studio? Okay. Yeah. No one asked that.
3. Architecture and Backend Hosting
As we scaled the project, Copilot now supports multiple editors, including Visual Studio Code, IntelliJ suite, Neovim, and Visual Studio. To handle the different programming languages and ecosystems, we use the concept of language servers. Our agent, written in TypeScript, runs on Node and handles common logic. We also introduced a proxy, a web service in the cloud, to be as close to the AI as possible. Our AI is now hosted on Azure, allowing us to scale globally. The use of two places for common code, agent and proxy, serves specific purposes.
So as we scaled the project, we had to make changes to our architecture. Nowadays Copilot supports multiple editors. Visual Studio Code is still our main editor, and I believe it's still our main part of the user base using Visual Studio Code. That's fairly natural, I would say, given how popular it is. But we also have really good support for all the IDEs in IntelliJ suite. So PyCharm, Ryder, all those different IDEs based on IntelliJ, they're all supported.
We have Neovim support, and yes, we also have Visual Studio support. So all those editors use different programming languages and different programming ecosystems to build their extensions. VS Code is Node and JavaScript TypeScript. IntelliJ, it's JVM languages, Java or Kotlin or some other fancy languages. In case of Visual Studio, it's C-sharp. Sorry that I need to mention that. So we didn't want to rewrite our code to target all those platforms, because our code contains quite a lot of different logic, we do some stuff. So this is the same problem that we realize, that this is the same problem that language vendor toolings often have.
So for example, if you develop Rust language tooling, you want to have this running in all your editors. And those vendors are solving the problem with using concept of language servers, which is basically this process that can be spawned by the editor, which contains all the logic, and then editors become a thin client that just interacts with the language server. In our case this is called agent, because that's fancy. It contains quite a lot of common logic. It's written in TypeScript, it's running on Node. I believe for IntelliJ we actually compile it to native distribution, we do something fancy with Vercel package. So yeah, it contains all the code that is around creating the input that we'll send It creates all the code, that telemetry, that is about synchronization settings and all that stuff. Then we've introduced proxy which is placed for the common code, but it's web service in the cloud. It's as close to the AI as possible. And then we have also AI on Azure this time. So we change our backend hosting. We are hosting the AI on Azure and I believe Microsoft recently announced this AI platform that's in preview right now which allows anyone to use the big language models from open AI that are hosted on Azure. And this is really important because it allows us to scale on the whole planet. Open AI initially just hosts their models in single location. Azure is distributed globally so we can host AI on any data center that we want to do that. You can probably ask why we have two places for common code, agent and proxy.
4. Logic and Codex in GitHub Copilot
Some of the logic in GitHub Copilot is connected with the editor and runs on your local machine, while other logic lives in the cloud. Features related to AI's safety and responsibility are on the cloud, while anything related to the editor is on the local machine. Codex is the AI model used in Copilot, based on GPT-3 developed by OpenAI. It operates on natural language and doesn't require special encoding or abstract syntax trees. It's a probabilistic system, meaning it tries to predict the most probable next word, but it's not always correct. This requires a different mindset from traditional software development.
This is a very good question. Basically some of the logic is more connected with your editor and then it's fairly natural that it runs on your local machine and some of the logic should live as close to the model in the cloud as possible. So for example, things like AI's safety and responsibility features, we want to have that on cloud not on your local machine for various reasons.
And agent, anything that's related to your editor, again, it's more natural that it lives on your local machine. That we don't need to synchronize everything to the cloud because that would be complex.
So I mentioned that I'm not a data scientist, I'm not an AI expert, but let me talk briefly about Codex. So Codex is the name of the AI model that we are using and what is Codex? Let me actually, I have note. Codex is a natural language processing artificial intelligence model based on GPT-3 developed by OpenAI. Yes, thanks Wikipedia, that's really useful. So let's go slowly here. First of all, it is model that operates on natural language, which is like English or any other language that it understands, including programming languages. But what it means is in principle is that we are just passing our context, our input into it as a string. We don't do any special encoding. We don't use abstract syntax tree to represent your code. We just take a file, push it, and hope for the best.
Secondly, this is like artificial intelligence model which, in principle, means that this is probabilistic system. We send something into it and then it tries to figure out what's the most probable next word coming in the text. Important thing. Most probable doesn't mean it's always correct. It's assumption that it tries to figure out. And also it's probabilistic on a scale, whenever we send it couple prompts, well, the same prompt couple of times to it, we may get back different results from it. So this is really funny. This required a lot of changing mindset from me as someone working with this system. You know, I'm a software developer. I'm used to function. Unit test. Input. Same output. Yay. Here we cannot do that, which is really interesting.
5. Introduction to Copilot and Prompt Crafting
GitHub Copilot is a large-scale model trained on billions of lines of code. It uses GPT-3, a language model trained on the internet. Codex, the AI in Copilot, understands programming languages and natural language. Prompt crafting is the process of preparing a string to send to the model.
Also, it's really large scale model. It has been trained on billions of lines of code. This means that we cannot easily introspect into it. We cannot really easily understand why it is coming up with some suggestion. We can only manipulate the input and observe the result.
And also I've mentioned in this very useful definition that it's based on GPT-3. GPT-3, it's a large language model that has been trained on the internet. So that's why Codex understands not only the programming language code but it also understands English or any other language, more or less. So it can understand comments and stuff like that. Basically, it means that Codex is a teenager that had a really, really lot of time and read the whole internet and a lot of code. You tell it something and it responds to you sometimes, and you don't know why.
So how to get this teenager, this model, to do something useful for you? And this brings us to the process called prompt crafting. Prompt crafting is a really fancy name for preparing a string that we send to the model. So initially, we started with a fairly simple approach. We just took the cursor position in your file. We take the file content from the top of the file, up to your cursor position, we send it to the model, and it works. Surprisingly, it works well enough. However, we were sure that we can do better.
6. Improving Copilot Suggestions
The more context we can send to AI language models, the better the result. Trivial improvements like adding language markers can significantly enhance the response. Sibling functions detection helps reorder code for better suggestions. Using context from other open tabs in your editor provides relevant code suggestions. We don't access your hard drive directly, only the files open in your editor. Observing results is an important aspect of the process.
In principle, when working with those AI large language models, the more context we can send to them, the better will be the result. Of course, the context needs to be useful. It's not like we can send some random crap. It will be good if the context is from your files.
But yes, the more things we send to it, the better will be the result. So we started with fairly trivial improvements. We added path and language markers, which is just a comment on top of the file. Saying this is language JavaScript and your name of the file is BlahTest for my fancy JS Nation demo.js. Even such, the lesson here is even such simple improvements actually make fairly reasonable improvements to the response, to the output that we have observed.
Why in this particular example? Using the language name marker means that model is less likely to be confused by what language it is, like all C family languages look the same, and you really, really don't want to see those C Sharp suggestions in your JavaScript code, so that's why we put the language marker. So that's step one. However, we still have two big problems in our initial implementation. We looked at the code that's above the cursor position, and, of course, in vast majority of programming languages, you can put your functions in any order, so there can be useful stuff below your cursor position. So we introduced something called sibling functions detection. It is basically a process where we parse the file and look for the functions of the same level as a function that you're currently editing, so if you're in class, we look at the other class members, if you're in the top level, the top level of the module, we look at the top level functions, and then we reorder stuff. So all the functions that we hope that are useful are above your cursor position or the point where we send requests, and we do that in memory. It's not like we don't change your file. We do that all in the memory processing the prompts.
And also, the pro tip here is don't have 1700 top-level functions in your file. Don't ask me why I know that, but yes, don't do that. And the last interesting part is using the context from other files of your project. Of course, again, in programming languages, we can create as many files as we want. We often create many, many of those files in the project. So we try to figure out what is the code that looks similar to what you are writing right now that exists in the other open tabs in your editor. And this is important part. We don't look at your hard drive or anything like that directly. We only have access to the files that are open in your editor as tabs. We assume that this is kind of a good signal for us that we can look into this file, because if you have them open, then in IDE, that can suggest that you're okay with working on those files. Okay. The last, but one thing that I want to talk about, and I have 28 seconds, so that will be fast, is observing results.
7. Observing Results and Copyright Restrictions
The model in Copilot is probabilistic, and its responses vary even with the same prompt. Observing results is done through offline evaluation and experiments on people. Users are satisfied even if suggestions are not 100% accurate, as they provide a starting point and scaffold for problem-solving. Copyright restrictions for using Copilot on company-owned code depend on the company. Copilot creates unique and personalized suggestions, not copying existing code snippets.
As I've mentioned, the model is probabilistic, and it first of all, its responses are probabilistic, but also with the same prompt it returns different responses. That means that we can only observe those results at scale. We cannot just unit test the model, we cannot unit test the improvements to the prompt crafting, because we don't know if this will be successful for all the cases. The problem space of working with developer tooling in general is that the code can look very differently, there are tonnes of the use cases all over around, so we need to make sure that we want to be as helpful to everybody as possible.
So we have two parts of observing results here. One is we have offline evaluation system where we clone a couple of thousand Python repositories from the internet. We try to figure out the functions that are well-tested, that have good test coverage, and functions that are well-documented, that there is documentation for those functions, we remove the body of those functions, and we try to regenerate the results with the body with the copilot, check if the unit tests are passing again, which is really interesting. The second way of observing results is doing experiments on people. That's why we have quite an advanced telemetry system in the copilot. If you're more interested in telemetry than we do, my colleagues from the team have actually I believe last week or something like that, the white paper about how we use telemetry and how it fits with measuring the user satisfaction. We also ran some, like, satisfaction user tests on people where they had to answer some surveys about how they feel. The interesting thing here is that, and this will be kind of, this was mind blowing for me initially, that it doesn't really matter that suggestions are super-accurate. What it seems is that users are really satisfied, even if Copilot gives them not 100 per cent accurate suggestion but a starting point for them so they can think about the problem or the next step that gives them scaffold for what they do. Which is really, really fascinating. I wanted to also talk a bit about user experience design but we don't have time for that.
So, thank you all for watching, and I believe that's now time for Q&A. Well, thanks a lot, Krzysztof. It was refreshing. Thanks for sharing. We're going to go right into the audience questions. First question from Hennoc. What are the copyright restrictions for using GitHub Copilot on company-owned code? This is really a great question. It's coming fairly often. First of all, I'm not a lawyer. Don't take anything that I will say as any kind of legal advice. Secondly, what is also important is that it really depends on your company, so I would recommend actually talking with someone from your company. However, our answer is that, in principle, Copilot is not copying the code. What Copilot is doing, it is creating a new suggestion that are unique, that are personalized in your particular context of what you're writing. Copilot is not a copying machine, it's not a pattern-matching machine that's just taking existing snippets of the code from its database or whatever. It's trying to figure out new code for your problems.
Copilot's Copying, Pricing, and Personalization
In very rare cases, Copilot can copy code directly from memory, but it's only for well-known snippets. A solution is being worked on. Copilot will become a paid product in the summer, but will remain free for students and verified open source contributors. Copilot currently suggests new code, but not code removal. Improvements to AI-driven development tooling are being explored. Copilot learns from individual users and suggests personalized code based on their coding style.
In very rare cases, Copilot can copy a code, but it's like 0.1% of the cases that it can copy the code directly from the memory. This happens only for very well-known snippets of the code. And also we are working on the solution to the problem that should be available soon-ish. Soon-ish. Like a Tesla timeline, or? No, like soon-ish. Okay, cool.
The question's from Catalin, does Copilot plan on becoming a paid product? Yes. So currently, Copilot is in a technical preview state, where people can use it for free. We use this period of time to improve the product massively. However, it has been announced by Satya Nadala, CEO of Microsoft, and Beale, during his keynote, that Copilot will go general availability in this summer. It will become a paid product there, however, it will be freely available to the students and to the verified open source contributors, whatever it means. So if I do some open source work, I can also use it for my company work? Yes. All right. And again, this is not legal advice, ask your company... If your company allows it. Yes.
Next question from Allison, Copilot can suggest what to write, but can it also suggest what to remove? Not right now. So right now, suggesting new code is our main user experience that we provide. Of course, we are looking into various improvements to the AI-driven development tooling in general. I think I've mentioned that I work at this thing called GitHub Next, and we are this team at GitHub that tries to figure out what's the next ten years of the developer tools and all the applications of AI is something that we definitely research all the time.
A question from anonymous. Does Copilot also learn from me as an individual? And will it suggest me code personalized, fitting my coding style? Yes, exactly. So Copilot, as I've mentioned, looks at your file, currently open file, and other tabs in your editor, which can be your own code, your own personal code. It's not in any way connected to your company, it's not in any way connected to your repository even. It just looks at the code that's in editor, that's in IDE to figure out what are the suggestions. So a follow up question then for me. If I want to improve the suggestions, I should just open more tabs with my code? Yes, this is one way of improving suggestions. Yes, indeed. Nice. So a good question from Anonymous.
Using CodePilot and Embracing Developer Tools
Using CodePilot or other IDE features is not cheating. The goal is to improve developers' lives, productivity, and happiness. Memorizing everything is not necessary. Embracing these tools is essential to keep up with the industry and not waste time.
Is it cheating if developers use CodePilot? Anonymous is a junior developer and thinks it's awesome, but he or she is worried that they will be less of a developer if they utilise it. Is it cheating if mathematicians are using calculator? I don't think so. I also don't think that using other IDE features in general is cheating. I know that there are some people that say, oh, you shouldn't even use traditional auto-complete because this way you'll become a better developer if you memorise all this stuff. That's not my position. I have been developing developer tools that help developers, and lower the barrier to entry for developers for my whole professional life. So I don't think it's cheating. I think that is our goal, to improve your life and your productivity and your happiness. The world is going that way. So you will be left behind, because your colleagues will be using it. So it would be a waste of your time.
Comments