1. Introduction to the Speaker's Journey
With AI and web GPU, it's an exciting time to be a developer. A lot is going to change, including the way we make apps. I have been making stuff with computers since I was a kid, combining programming and design. However, using different tools for each was creatively stifling. I always wanted to do design and development at the same time, but it seemed impossible.
So, we will get to this question in a few minutes. But, with AI and web GPU, it's just an exciting time to be a developer. It looks like a lot is going to change. I think we're making different apps. Are we going to even call them apps anymore? I'm not sure. But a lot is going to change. And let's explore some of those possibilities today.
Now, who am I? My name is Aryami Naim. Like most of us here in this room, I have been making stuff with computers since I was a kid. This is how the programming side of it looked like in the very beginning. And this is how I did design stuff. I'm actually not that old. I just had old, cracked versions of these. But I was making little apps, little sites, little games. And so it always required both programming and design. So more programming tools, more design tools sort of pile up. And it was always weird. Because when you're making a game or an app, on a day, you're making hundreds, if not thousands of these micro design and development decisions. And a lot of those decisions just don't fit within, let's say, VS Code or 3D Studio MAX. They span the whole spectrum of making an application. So it was always weird that I had to either be in VS Code, for example, or at the time, say, Dreamweaver or in Photoshop. And I had all these little micro ideas. And I felt like in the span of switching between one app to the other, a bunch of them would die. It was just creatively stifling. So it was always weird. And I always felt like, I want to do design development at the same time, but every day I'm waking up, and Morpheus is basically offering me one of two pills. Take the blue pill or the red pill, do either design or development. There's no good or bad pill here, I'm stretching a metaphor right now. But you have to pick one. And I was always like, can I take them both? And he was like, that's not how it works.
2. The Violet Pill and Pure Blue
I always wanted to take design and programming together, seamlessly. Flash was a design and development environment, but it wasn't the right fit. Many attempts have been made to create a seamless design and development environment, but it's a hard nut to crack. That's why I started with pure blue, a powerful programming environment.
And so I always wanted to take them both, like a violet pill. You can just design and program in the same environment seamlessly. I looked for this environment, one of them was this. Anybody remembers this? Yeah? Okay. It's a young audience here, just very few hands.
This is Flash and some people love Flash. I love Flash. Yeah! Give it up for Flash! We have like five people over 30 here. Okay, so for those of you who don't remember, Flash was a design and development environment and I loved it and a lot of people loved it. They were making awesome stuff with it. Was it that violet pill though? Not really. Like you could do programming and design in the same operating system window, but the programming was limited, the design tool wasn't as expressive as say Photoshop or Max and stuff like that. So, it wasn't really a violet pill. It was more like, you know, a blue and red lollipop. It was tasty. It was really good, but not the right stuff.
So, I really wanted to have that violet pill. A lot of people have tried it to make that seamless design and development environment. It just, it never catches on. It's just a hard nut to crack. So, there have been many attempts over the years. At some point, I thought maybe I should give it a go. How hard could it be? Naively, I thought so. Anyway, eventually that turned into something that we call TheaterJS. Now, it takes a different approach. I thought instead of making that whole violet pill at the get-go, which is a very hard thing to do, let's just start with some pure blue. Now, what is pure blue? Pure blue is a programming environment. It could be, you know, VS code, the programming language could be JavaScript or Swift or anything else. Let's start with some pure blue, because pure blue is super powerful. There's nothing you cannot do with a programming language.
3. Pure Blue and Violet
If the CPU and GPU can handle that, if the display can actually display the visuals, the programming tool can actually do that. Let's start with some pure blue and then add little bits of red to turn it into violet. We added a sequencing tool for web animation and native devices, followed by a 3D composition tool. The New York Times used Theater.js to reconstruct shots of the World Cup in real-time, allowing users to follow the games through 3D visualizations. Other examples include a recruiting page by Planet and Wildlife and a well-crafted scrolly by Studio Freight. Adding AI to the mix is the next step, and the non-AI scene is built with 3JS, React 3 fiber, and React, with editing capabilities using Theater.js.
If the CPU and GPU can handle that, if the display can actually display the visuals, the programming tool can actually do that. So, let's start with some pure blue. And then add little bits of red to it. It turns, like, violet slowly.
So, the first bit of violet that we added was a sequencing tool. So, this is for people who make animation on the web or, you know, and also some kind of native devices. Then we added a 3D composition tool, and as we go on, we just add, you know, more and more of this violet stuff. Now, it's kind of...it's really crazy how much people can achieve with just that little bit of violet.
So, for example, here's The New York Times. They have been, like, covering, for example, the World Cup. They had some guy in Qatar taking, like, just, I don't know, hundreds, thousands of shots of the game and beaming them to New York, and then they were reconstructing the whole in New York using Theater.js to put the whole thing together because you have designers, developers, journalists, all working together. You don't want to hand things off in between them. It's all, you know, it's a newsroom. It has to work really fast. So, they used Theater.js, and because of that, you can actually, like, follow...you could follow the World Cup and you could follow these 3D visualizations of games, you know, on that same day. This one is another example by Planet and Wildlife. Probably the most hardcore recruiting page ever. And, this one is just probably the most well-crafted scrolly out there by Studio Freight. Big fan of them. So, yeah. The project is called Theater.js. You can check it out on GitHub. We're just adding, you know, more and more violet to it as we go.
All right. So, all of that stuff required no AI. So now let's see what happens when we add a bit of AI to the mix. All right. So, here's my non-AI scene. You know, it's made with 3JS, React 3 fiber if some of you know that, and React, basically. And you know, I can sort of edit things using Theater.js.
4. AI in the Creative Process
Adding AI to the creative process can save time and enhance the workflow. Waiting for a chatbot-like AI can be time-consuming, but with a co-pilot AI, you can work more efficiently. The process involves handling variables, requesting specific effects like studio lighting or film grain, and utilizing code to initialize the GPU and LLM.
Like I can change the variables and stuff like that. Now, that's one way to do it. Now, we have added a bit of AI here. So I can say something like, can we have some studio lighting here? It's a little loud and I don't have my microphone, so it might not actually work. But we'll see. And my internet is not good, so it takes a while. Normally it should take less than a second. So give it a moment. Come on, GPT. I might actually not have internet. Let's see. Well, they said demos are hard. Oh, there you go. Okay. Now, this takes a second normally. By the way, right now we just saved about 10 minutes from an experienced user and about maybe 15 to 20 minutes from someone who is just starting out.
Can we add some film grain here? Now here's the thing. This is the first thing I want to mention about making things with AI. If the AI acts like a chatbot, you can wait for a chatbot. You can wait for GPT-4 to finish its thoughts. Sometimes I choose GPT-4 over GPT-3 because I like the quality of the answer. But if the AI is a co-pilot, if you want it to help you in your creative work as you're doing it, you just don't want to wait for things. All right.
So how does this actually work? Well, this is a code base. It's a pseudocode. But I think it's pretty obvious what's going on. You get a handle to your GPU. You initialize your LLM. You have to warm it up. We haven't done that.
5. The Infinite Loop and App Replacement
The process involves an infinite loop where you wait for a voice command from the user. You generate a prompt based on the command and apply it to the app. This iterative process continues by replacing the app and repeating the steps.
That's why it took a while because we're using a cloud offering. I'm going to tell you later why. Then you have this infinite loop. You wait for a voice command. The user says something. You take that command and you generate a prompt. And the prompt is basically something like, hey, here's my app and here is a command. Please do that command to my app. Right? So I just asked to give me studio lights and it adds studio lights to the app. Now we get that new app. We hot replace it and we rinse and repeat. And that's how the whole thing basically works.
6. Challenges in Production App Development
Setting up the first iteration of this process takes about half an hour to an hour. However, a production app requires more preparation, including handling errors, internet problems, and avoiding mode switching, which can hinder creativity.
Now, of course, this is just the first iteration. Like, if you want to set something like this up, it just takes, I don't know, maybe half an hour or one hour. And that's a far longer way, let's say, from a production app.
A production app requires that, for example, you have to be ready for the LLM to hallucinate. Like give you an app that you just cannot run. So what you do there is, for example, you could retry it or if you have a large LLM, you can just feed the error back to it and it just so that you can give it another go.
Another thing is, as we just saw, if there's an internet problem, or for example, your LLM is just too large and it just takes a while for it to give you an answer, then that's going to be... What it does to the user is that the user kind of has to switch modes. Like on one mode I'm just making stuff with my hands, like editing stuff. And the other mode, I'm like, talking to some agent... This switching mode thing, that's also creatively, just kills little ideas, you don't want to do that, you want to have something that works really fast.
7. AI Writing Code and Different Commands
Alright. Let's try something else. Can we have a widescreen mode when I press the space bar? Oh, it's a white screen. Can we have anamorphic mode? If the LLM understands me, I don't need to be at the keyboard. Hand tracking and pose tracking are more intuitive inputs. The AI can write code for you. How would the command to go into widescreen mode be different? The AI would write code.
Alright. So, all of this was very February 2023, like, you know, there's an LLM editing an app, what else is new? So let's try something else. Can we have, like, when I press the space bar, we go in widescreen mode, and when I press it again, we go back? Everybody, cross your fingers for me, please? Thank you. I want to do this with the local LLM, but I just want to show you how things work in production.
Oh, it's a white screen. Can we have it so that when I press the space bar, we go anamorphic, and when I press it again, we go back? That didn't work. All right. Could you imagine for me that that would work? Wow! Everybody say, wow! So actually this is what's happening here. Let me just explain. So there is this voice model called Whisper, right? And it actually understands everything that you tell it almost better than a human being. Actually in a programming context, it's actually better than a human being. We didn't use it here because we wanted to use a local model just to make things go a little faster. But I didn't realize that all the noise here makes it misunderstand me. But if it didn't misunderstand, which is actually pretty easy to get to, then you're not even bound to the keyboard anymore. If I am sure that the LLM is going to understand me, then I don't need to be here always correcting it. I just said anamorphic and it took anamorphic. If that doesn't happen, then I don't have to be at the keyboard anymore. I can walk away, do things with basically a remote control device. This is just for show. People are not going to be carrying little game controllers. Things like hand tracking and pose tracking are going to make much more sense. Those models have gotten really good in the past couple of months. Six weeks. Two things. You can basically walk away from the keyboard and trackpad. I don't think these would even be the primary modes of input after a while. And also you can have the AI write code for you.
So, here's a question. When I asked the AI to, when I pressed the spacebar, go into widescreen mode and when I press it again, go back, had that worked, how would that command be different from the previous commands that I gave it? Anybody have a guess? Now, there the difference is that this time the AI actually would write code. It would write an algorithm.
8. The Power of User Editing and the Role of React
It would change the algorithm of the application. That actually works. The user can add behaviors to your application or just remove things and just entirely change the nature of your application. However, we still want to give some of that power to the user, so we have to find this sweet spot between too much power and no power at all. And for me, that's where React comes in. The React model says that you can divide your application into small building blocks, and these building blocks are either self-contained or their requirements are explicit. It's just an organisational structure that fits well with an LLM editing our application. Let's have a look at a simple React tree and discuss why an LLM would be well suited to edit this application.
It would change the algorithm of the application. It would basically change the behavior of the application. And that's pretty powerful. That actually works. Again, here because of the voice model didn't work, but this is something you can actually get running in your application like tonight, basically. But what that means is that now the user can add behaviors to your application or just remove things and just entirely change the nature of your application even. Now that's pretty powerful, but how much of that power do you want to give to the user? There is a high end of that power, which is that we're just going to open source our whole app. It's as if the user's editing the app in VS Code or Repl.it. And we're just gonna like let the user edit the app by talking to our LM agents. Now that's awesome, that's really powerful, but what if the user just breaks the app? What if it doesn't even boot? Or what if there's an error 50 stack trace lines deep and we have to show this to the user? That's obviously way too much power and way too much complexity that we're just unleashing to the user. It just doesn't work for most cases. However, we still want to give some of that power to the user, so we have to find this sweet spot between too much power and no power at all. And where is that sweet spot? And for me, that's where React comes in. And by React, I don't mean React the library, I mean the React model. So the React model, imagine the broadest definition of the React model. So it contains, it also applies to Svelte, Veo, Solid, Swift, UI, Flutter, all of them. The model is actually pretty simple. It says that you can divide your application into small building blocks, and these building blocks are, for the most part, either self-contained or their requirements are explicit, which are basically props, or if they require context, you know, context is an implicit prop, but a shortcoming of React library, in my opinion. And if they have, if a component has a side effect, it returns an effect descriptor, that's the JSX tags basically that we have. And if the effect doesn't fit within a descriptor, then you use the escape hatch, which is use effect. So it doesn't say anything about reactivity. You know, you could have fine-grained, coarse-grained signals, all of that. It doesn't say anything, you can actually like run an ECS engine, Entity Component System, like a game engine, being produced by React components. So it doesn't even say anything about that, doesn't say anything about JavaScript or some other language, web or native. It's just an organisational structure. And that organisational structure happens to fit really well with an LLM editing our application because we don't want the LLM to edit the whole application, again, you could just break everything. But what we want to do is to create these small building blocks and let the LLM just edit those small building blocks. And those building blocks happen to fit within the React model. And what that means is that, well, first of all, let's have a look at a simple React tree. We'll just talk about why an LLM would be well suited to edit this application.
9. React Components and WebGPU
React components can be more fine-grained and smaller, thanks to the LLM model's ability to handle dependencies. WebGPU enables running AI models locally on a machine, prioritizing privacy, latency, and cost.
First of all, does child A have access to context B? Of course not. We can just look at it and just tell. It's explicitly encoded in how we represent the React application. Also, if child B breaks, like it throws an error, if it has an error boundary, it's not going to affect child A, right? It's just... It's its own thing. Now, that happens to work really well for an LLM. But first of all, if it's just editing one component, the source of one of these components, well, if the component breaks, we can basically just retry the LLM run or we could even feed the error back to the LLM and let it basically have another go at it. So, React happens to fit really well with the LLM model. What that actually means in my opinion is that we're just going to be writing more fine grain smaller components. Because do you guys remember at one point we used to separate our components between presentational and logical? We wanted to have the smallest unit of React component possible in order to make the whole thing manageable. But that didn't work out because there was just way too much dependency to track between components. A component could have like 10 props and then you have to get those props and pass them to the next component and it was too much to keep in mind for a human programmer. But guess what, that's not a problem for an LLM. An LLM can easily keep track of hundreds of these dependencies. So what this means, I think, is that we're just going to have tiny, tiny React components and the LLM is just going to basically edit them for us.
Now let's talk about WebGPU. Starting with some definitions, what is WebAssembly? WebAssembly allows you to run untrusted code that doesn't come from someone that you know safely on your CPU fast. Well, WebGPU does the same thing with the other processor on your computer. And that is really good because AI models happen to really love GPUs. So that means that we can run an AI model locally on our machine. Why would we want to do that? Well, we're probably making some sort of... We're basically deciding in favor of privacy, latency and cost. So for example, with privacy, you know, of course, if there's a medical application or something, everybody knows why privacy matters there. But in the case of a creative tool like TheaterGist, for example, I think creative tools are like tools for thought, and it actually helps to be able to create in private for a while that can actually be liberating. So I think privacy even matters in creative tools. Latency, of course, also matters. And as we just saw here, you know, if it bored the audience members, it's going to bore the creator. So, you don't want a chatbot sort of wait times. You want things to just, you know, happen really fast. So, you would put an LLM inside a local machine.
10. Running LLMs on WebGPU
Can we run LLMs using WebGPU? Not yet, as the models are not trained enough. However, there are star coder and replicate code models being trained currently. Once optimized, a code editing instruct model can edit a scene inside the browser.
And of course, cost also matters, but I'm going to skip over it because we don't have that much time.
Now, can we actually run these types of LLMs on using WebGPU? Well you actually can. These are two examples. You can just, you know, try them. Try them in your, you know, latest version of Chrome, for example. They work pretty well. Can they actually do what I was showing you there? Actually no, they can't. Not yet. They're not there yet. The models are not trained enough at the moment. The models that you can actually run fast on your machine. That's why we actually use an online cloud-based model. But they're actually getting there. There's a star coder model and the replicate code model. These are being trained like, you know, right now. Like they're just, you know, getting better and better basically. And, you know, they're pretty small, like 15 billion parameters, 7 billion parameters. They work really fast. They run on a local machine. They also have a lot of WebGPU optimization, low-hanging fruits. Like right now, if you try to run them, it takes a bit of work and it also is still slow. But that's because the WebGPU pipeline is just not optimized yet. So, once it does get optimized, you can actually run a code editing instruct model that basically edits a scene that's, you know, your Notion page or something like that, basically running inside the browser.
11. AI's Impact on Normal Apps
Now you can either wait for these models to mature or you can start developing right now. AI affects both creative and normal apps. Some people are shocked by AI's coding capabilities while others are oblivious. To understand AI's impact, try an experiment with the Uber app and GPT-4. Use the Uber API to create a chatbot that can order rides. Then add the Lyft API for more options and functionality.
Now you can either wait for these models to mature or you can just know where things are heading and start developing right now. So in the case of theater, we're, yeah, I'm going to just, you know, cut into the Q&A time a little bit. All right. So you can either, you know, see where things are heading and develop your application right now. Use a cloud based model and then later you can actually switch back to a local model if it actually makes sense in the case of your application.
Now, until now, we talked about how AI affects a creative application like Theater.js, Blender or even like a productivity application like Notion. But how does it affect a normal app like Uber? There's a lot of, you know, fear, uncertainty, and doubts just going around. I think some people are in just the state of shock and awe with how much, how good AI can code. And some people are entirely oblivious. Maybe that even describes some both of those descriptions apply to some people.
I personally, when I saw how good GPT-3 had become, I was in a state of shock for a little while because I was thinking, you know, what happens to my expertise? What happens to, you know, how I make money? I have an experiment to suggest to all of you. I think neither this shock and awe state nor this oblivion, you know, let's just ignore the whole thing state is healthy. I want to suggest an experiment so everyone can develop their own first principles understanding of how much AI can do and how it affects your work. So, here's the experiment. Open the Uber app on your phone or some ride-sharing app and click around, you know, try to order a ride right before you actually get a ride. And imagine if Uber provided an open API where you can just like make your own Uber application using this API, right? You can just order rides. Now take that API and feed it to, let's say, GPT-4 using a library called LangChain. You can do that as a JavaScript developer, you know, it probably takes a day. It's a weekend project at most. So in the end, you would get a chatbot that can get Uber rides for you, right? We can mark out the API. And then just talk to the chatbot. And see what happens. Like, you know, get me a ride to Mom's. And it's probably gonna tell you, oh, it's gonna cost 15 euros, you know, and should I get it? You can say yes, and it's just gonna get that ride for you. Now do that. And then give it one more API, which is the Lyft API. Right? So you say, hey, give me a ride. And then the chatbot is gonna say, yeah, so there's a 15-euro option, a 17-euro option, but one of them is actually further away from you, so it's gonna pick you up like three minutes earlier. Should I get the earlier one? Yes. Do you want me to notify Mom? Yes, please.
12. The Future of App Development
That's probably going to be a much faster, much better way to order a ride than opening any application. Are all apps going to go away? I don't think so. Games are not going to go away. I don't think creative apps are going to not be an app anymore. But getting a ride, for sure. Are we going to be using the same tool sets that we have been using? Probably not. Are we going to have to jettison a bunch of the libraries and a bunch of our methodologies? Conference names, even? Probably yes. But at the end, we're just going to have a much more powerful way to serve users. By the way, serving users still requires problem solving. Last time I checked, the reasoning capabilities of these models, they're not... If they're at human level, then we have other problems to worry about. So I think it just makes sense to see where things are heading and instead of waiting for our jobs to be disrupted and applications to be irrelevant and the libraries that we use to be... To not matter anymore. We can just be ahead of that curve and we can start building things today. All the technology is there and guess what? I think as JavaScript developers, as people who are best at hooking different pieces of technology together, that's JavaScript. I think we are at the best possible position to be in the industry.
That's it. Alright. That's probably going to be... I'm just gonna skip over these. That's probably going to be a much faster, much better way to order a ride than opening any application. If you want to order a ride, if you want to do a lot of the things in this little animation, it's going to be faster with an AirPod. You're not going to use React. As a user, you're not going to interface with React, you're not going to interface with Solid, with Svelte, probably not even with JavaScript. It's just faster.
Are all apps going to go away? I don't think so. Games are not going to go away. I don't think creative apps are going to not be an app anymore. But getting a ride, for sure. You can do that, again, yourself, as a developer at home, it takes a day. Now, is that cause for fear? I don't know how your psychology works. Fear is good for some people. To me, it's just exciting, eventually, when you look at it. Because now we can serve users in a much bigger way.
Are we going to be using the same tool sets that we have been using? Probably not. Are we going to have to jettison a bunch of the libraries and a bunch of our methodologies? Conference names, even? Probably yes. But at the end, we're just going to have a much more powerful way to serve users. By the way, serving users still requires problem solving. Last time I checked, the reasoning capabilities of these models, they're not... If they're at human level, then we have other problems to worry about. So I think it just makes sense to see where things are heading and instead of waiting for our jobs to be disrupted and applications to be irrelevant and the libraries that we use to be... To not matter anymore. We can just be ahead of that curve and we can start building things today. All the technology is there and guess what? I think as JavaScript developers, as people who are best at hooking different pieces of technology together, that's JavaScript. I think we are at the best possible position to be in the industry. All right.
Comments