English versionEN

Generative Ai In Your App? What Can Possibly Go Wrong?

Utilizing generative AI models can result in a lot of varied and even unexpected outputs, making them less deterministic and harder to test. When trying to integrate these models into your app, it can be challenging to ensure that you maintain a high level of quality from these AI outputs, and even ensure that their results don’t crash the flow of your app. Come relive my journey of discovery into how I was able to drastically improve results from OpenAi’s ChatGPT API, for use within my company’s product. In this talk I will share many tips that will help enable you to more effectively utilize the power of AI models like ChatGPT within your own apps, including testing strategies and how to avoid many of the issues I ran into.

This talk has been presented at TestJS Summit 2023, check out the latest edition of this JavaScript Conference.

FAQ

GenreBI is used in applications to enhance product features, particularly in the context of writing tests and integrating AI into codebases. It helps in understanding and planning the flow of apps, managing dependencies, and improving the overall quality of the product.

Severity's mission is to make better decisions, focusing on improving decision-making processes within product teams, which include engineers and product managers.

Integrating OpenAI models into applications can be challenging due to the models' limitations. Understanding these limitations is crucial, as the models are very good at certain tasks but might perform poorly in others.

Before the official ChatGPT API was available, an experimental setup used Docker and Puppeteer on an EC2 instance to create a makeshift API. However, this setup faced issues like Captcha and automatic logouts, making it unreliable for production use.

Severity encountered difficulties with JSON parsing using OpenAI's API, as the AI would often return non-standard JSON formats. They experimented with different prompts and settings to improve JSON parsing accuracy.

Severity employs integration testing and contract testing to manage costs and improve testing efficiency. They also explore dynamic consumer-driven contract testing to handle the non-deterministic nature of AI responses.

The introduction of GPT-4 improved response accuracy from OpenAI's API to 80%, but challenges with JSON parsing and response consistency persisted, prompting further adjustments and tests.

Severity uses function calling APIs to increase control over AI responses, explores different data formats like markdown, and implements retry strategies to handle errors and inconsistencies in AI outputs.

artificial intelligence

Todd Fisher

29 min

07 Dec, 2023

Comments

Video Summary and Transcription

Today's Talk discusses the application of GenreBI in apps, using Docker to make ChatGPT work on any machine, challenges with JSON responses, testing AI models, handling AI API and response issues, counting tokens and rate limits, discovering limitations and taking a reactive approach, reliability and challenges of AI APIs, and the use of GPT and AI Copilot in software development.

Available in Español: ¿IA generativa en tu aplicación? ¿Qué podría salir mal?

1. Introduction to GenreBI in Apps

Short description:

Today, I want to talk about GenreBI in our apps as a product feature. We'll discuss its application in codebases and the importance of quality. Understanding the flow of an app and its dependencies, specifically the limitations of OpenAI models, is crucial for better decision-making and overall product quality.

Awesome. So today I want to talk a bit about GenreBI in our apps as kind of a product feature. So a little bit different than the previous talk that Olga gave, which was pretty awesome, you use GenreBI to write tests. In this case, we want to talk a bit about kind of the other side of where GenreBI might be applied in your codebase, which essentially trickles into your test, right?

So with that said, I want to briefly kind of cover our, the company I currently work at, Severity, been here for about a year. We do a bunch of AI stuff. Our ultimate mission is to make better decisions, kind of from the human existential, let's make better decisions wherever we can. But of course, that's a big picture thing. So we really want to focus right now on making better decisions on product teams. So engineers, product people, that type of thing. So if that sounds at all interesting, go to this website. If not, that's fine too. That's more just context of, we've been working with this thing for about a year. And with that, there's of course a lot of kind of GenreBI stuff that we've been using. And so that's kind of just a bit of context here as far as what I'm about to share today, because really there's a lot of cool things you could do with AI, but there's a lot of weird stuff when you actually put AI into codebases and try to make things work from a product perspective.

So, with that said, quality is very important. There's a lot of aspects of quality in codebases out there. Testing is one aspect of many aspects. There's a lot of different ways we could test. We won't get into that because I think we all have some good kind of sense of what those things are. Unit, there's manual testing, automated testing, and a billion other things. Now, another important aspect of quality is this idea of kind of planning out or architecting the flow of an app. Understanding that, it's very key. Understanding how users will flow through your app, understanding where things may break down, that type of thing. So, I'm going to touch a bit on that as well today. And then this third one is understanding dependencies, specifically their behaviors and limitations. So, when you start using AI in your app as a product feature, you have to really understand those dependencies. In this specific case, the case study I'm sharing with you today with the very stuff is we're using the OpenAI models. And with that, it's very critical to understand that the OpenAI models are very good at certain things, but very bad at other things. And so, the better that we understand the limitations, the overarching the better quality we could give, because ultimately, at the end of the day, whether it be tests, whether it be designing things out, quality is really what we're trying to solve for. And so, with that said, this actually can apply to other LLMs, other models out there, but specifically, we have been using OpenAI, and so these examples will be in OpenAI format.

2. Using Docker to Make ChatGPT Work on Any Machine

Short description:

One of the features of Verity was the ability to create and generate documents. We also had an example code that demonstrated the usage of ChatGPT. However, there was no ChatGPT API for codebases, which was a big downside. To address this, we came up with the idea of using Docker to make it work on any machine. Let's explore this concept further.

So, with that said, one of the features of Verity that we built early on before ChatGPD came out was this idea of creating a document, generating or drafting a document, and that worked out pretty well. Let me just go to the next one.

So, example code here. I'll just kind of briefly cover this to see if this works good. Oh, look at that. Cool. Awesome. So, typical import of SDK stuff up here, and then we have a completions call. That's essentially how OpenAI will actually call the AI. And then in here we have this prompt right here. This is essentially what you would see in any ChatGPT type of thing, where you type in some sort of command, some sort of prompt. That is covered right here. So, with that said, we have this example here. And of course, if you ran that, you would have something like local host is my server, what's yours, to complete the roses are red, violets are blue. Sorry, I feel like this thing is in the way. Let me get up here more, and so forth, right? And so, like I said, so if you go to the playground in ChatGPT, kind of same idea there. So, those are kind of the two things that are the same there.

So, with that said, so if we rewind back to November of 2022, ChatGPT came out. It was a big rush. Everyone's like, so excited. Oh, ChatGPT, it's going to revolutionize the world, all that good stuff. And I think that for the most part they were right. And so, one downside though, with that announcement was there was no ChatGPT API for codebases to actually use cool AI stuff, at least not yet. And so, that was a big bummer. And so, you know, thinking about like this playground where you could type in any prompt and generate whatever the heck you want, we didn't have that in the codebase, right? And so, we had this cool idea. So, anyone ever use Docker in here? So, Docker is kind of a cool thought, right? And this is one of my favorite memes about Docker. It's like, well, it works on my machine. What if we figure out a way to make it work on not my machine? And so, let's actually take this. So, this is something that we tried. So, we have essentially the browser window.

3. Using Puppeteer and GPT-3

Short description:

We used Puppeteer to host our app on an EC2 instance in the cloud and expose an API. However, we encountered issues such as Captcha and login limitations. While it was a good experiment to test the chat2PD API, it wasn't reliable for production code. Later, we got GPT-3, but discovered a distinction between natural language and JSON. Let's explore some examples using the chat completions API and the askAI function.

We use Puppeteer. And we say, hey, let's throw that onto an EC2 instance. Let's host that in the cloud. Let's throw Node.js on there and expose an API. And so, we were able to before chat2PD API came out, we were able to actually use the API, if you will, in our app.

And of course, nothing will go wrong because that's the greatest idea, right? Not so much, right? So, a few things that we ran into there is apparently when you don't have a human cooking stuff around websites notice. So, Captcha was a thing. We bypassed it with some interesting things that I learned about Captcha. I won't get into that. But another thing we ran into is login. So, after about 15 minutes, of course, the thing would log out. And so, when you hit the API again, you would, of course, fail. That wasn't very good. And of course, there are some limitations with the account that we're using. You can't have so many requests or whatever it may be. And so, long story short, with this experiment, the question is like, was it a good thing? Was it a bad thing? I would say it was good in that it helped us test out the chat2PD API, if you will, before it came out. But it's a bad thing in that we probably would never use this in production code because it just wasn't very reliable, that type of thing. So, long story short, it was a good experiment, but we moved on, right?

And then, of course, a few months later, chat2PD came out with APIs and we got GPT-3. And we were so excited. It's pretty awesome. We learned early on, very quickly, that there's a distinction between natural language, which the LLMs, the AI is very good at, and JSON, which the AI is surprisingly not very good at. And so, we'll run through some examples here. So, in this case, we set up our code here. We say GPT. In this case, we actually started with 3, but we'll just do 3.5 for this scenario here. So, this is just a helper function here. But ultimately, this is using the chat completions API over here. And then with that, we call that same function. So, let me just go back there to make that clear. So, askAI is the function name here.

4. Working with JSON Responses

Short description:

We had various needs of returning JSON for various purposes. We asked the AI and tried to JSON parse the results. The initial responses were not valid JSON. We made several attempts to get valid JSON responses, and eventually found that providing examples improved the results. However, it still didn't work reliably, and we had to wait for GPT-4, which improved the success rate to about 80%.

So, the next slide over, we're going to use that in here. So, we have a prompt over here. So, we had various needs of returning JSON for various purposes. In the case of this specific example, I simplified a few things. But just pretend whatever it is, like, hey, return five cool jokes in JSON, whatever the prompt is, pretend that it's in here. But ultimately, we asked the AI, and we tried to JSON parse the results.

So, let's see how that went. So, this is the raw response for that code. And if you notice, this right here is not quite JSON. That is just kind of human language, which isn't the greatest, because if you JSON parse that thing, you're going to throw an error, and bad things will happen, right? So, we go back and say, hey, let's add this one. Return only the formatted JSON value. And then returns this. And it's like, okay, that's cool, but that's not valid JSON either. Like, okay, like, thanks for nothing. So, we go back and alter the prompt again. No other words. Like, please, only return JSON is the idea, right? And it did it, and it's cool, except I want to return a JSON array, which is slightly different than an object with a property of an array that has an array in it. So, it didn't quite parse very well. So, for that reason, oh, that sucks. So, we tried again. We do a few other things. Ultimately, we get down to realize, hey, if you have examples, it actually works out a lot better. So, we go and give an example, and now it starts working, and we're all happy about that. Woohoo! But of course, with anything, sometimes it doesn't work about 60% of the time, which isn't a super big number. And so, we kind of go back to the drawing board, and we're like, okay, cool, that sucks. We can't reliably use this quite yet. At least not in the way we wanted to. And then, of course, a few months later, GPT-4 comes out, and that's pretty awesome. So, we plug in GPT-4 in the model, and it turns out it works about 80% of the time, which is a positive amount, a net positive there, but it's not 100%. So, we're like, okay, well, that's kind of weird.

5. Testing Challenges with Dynamic AI Models

Short description:

GPT-4 is better at JSON, but slower and more expensive. We faced challenges with invalid JSON responses and explored the idea of calling the API multiple times. Integration testing is a common approach, but contract testing with a mock API was chosen to avoid cost and latency issues. However, the dynamic nature of AI models makes it difficult to test. The non-deterministic nature of AI models and the need for dynamic consumer-driven contract testing pose challenges.

What else can we do here? So, we think about things. GPT-4 is better at JSON, but it is slower. It does cost a bit more. And of course, because it still returns invalid JSON, we're kind of just scratching our heads, like, do we just call this thing over and over again, like 20 times until it returns valid JSON? Maybe. And we tried that a little bit, for worse.

But I want to think about kind of how to test some of these things, right? So, a typical pattern here is integration testing. We all kind of know this idea of you have your production code, it hits some sort of API, in this case, the OpenAI API, and you typically have some sort of inputs and some sort of response. Your tests typically utilize the production code to then essentially do the same idea, right? And then there's this kind of alternative approach, contract testing.

One big benefit here, specifically when you're hitting the OpenAI API, is there's some limitations you're actually paying per use. And so, if you were to run a test suite, for example, hitting the Live OpenAI API, you're going to incur a lot of cost, potentially. And of course, there's some additional latency things there. And so, that's one reason why we chose to say, hey, let's build some Mock API stuff via contract testing. And so, the idea there is, whenever you run tests, you're actually running kind of the contract, which works out pretty good for most APIs that I've used.

But an interesting fact is, we have this interesting new thing, essentially a new paradigm with AI, with some of these LLMs, these large language models out there. Yes, we do have kind of these more standard contracts of the API that we're hitting. So, we have the input over here, we also have the response. But in addition to that, we have these more dynamic pieces, where depending on what I put in this prompt, this content may look completely different. And so, that's a new paradigm, at least for me to kind of wrap my head around. And so, with that said, another kind of interesting angle here is we say, hey, the prompt is what is a bunch of numbers added together, the output is that. That works pretty good. Let's run the exact same thing again. And we see that the output is a bit different. That could be problematic when you're running a test that maybe is checking for certain things. Run it again, another thing, and so forth. So, we see that the non-deterministic nature of the AI that we're using is going to make it really hard to test some of these things. And so, we have to kind of think of what else can we do there? And of course, with any AI model, there's this question of you really don't know what you're going to get. There's a lot of ways to kind of solve for that. But long story short, the way I think of testing some of these AI things is dynamic consumer-driven contract testing. So, for what's worth, look that up if you're not sure exactly what that is. But in essence, we're able to test certain test cases, but other ones, maybe not so much.

6. Testing AI Models

Short description:

When testing AI models, it's important to avoid expecting consistent results. Instead, we need to consider whether it's worth having a test that may be flaky due to the nature of AI models.

So, in this case, we want to test that it returns the right format, then go and do something. And conversely, if it's the wrong format, go do something else. Things that you want to avoid when you're testing these types of AI models is avoid saying this is always going to return the exact same value, because it won't. There's, of course, ways to kind of increase the chances there, but long story short, you're not going to get that consistently going there every time. So, you have to kind of debate like, is that worth having a test that's going to be super flaky because AI model and so forth?

7. Using Functions and Exploring JSON Alternatives

Short description:

The option of using functions in the app to call the AI and the utilization of the chat feature were significant. We added functions and saw good results with valid JSON. However, JSON is not always consistent, so we explored alternatives like markdown and simple parsing. Currently, simple parsing is our go-to option.

So, a lot of the features that I'm talking about today, you could just look them up on the OpenAI API documentation. There's a lot of cool details in there. But essentially, what this option gave us is the idea of, hey, we have an app, let's show the AI, here's a list of functions we could call, and the AI will decide when to call those. And so, that's kind of an interesting thought.

And the feature that we added there that utilized that was this idea of a chat. So, we go back and forth, similar to chatGBT, but at some point, we have a button like create a new document, for example, or whatever the function in your app may be. We were able to utilize the AI to use that via that feature. And that was a pretty awesome thing. And the idea there is, well, if it called a function, the function had an argument that was of type of JSON, it will probably return JSON, was kind of the big hope, right? And so, we did that. It was pretty good. So, we go through the code. We essentially add the functions down here. We say it's type JSON and so forth. And of course, we get some good results. And overall, it works pretty good. So, we see that there's valid JSON right there. And we celebrate. And it's all fun times, right?

And so, we think about this, we think about the consistency, as far as API contracts goes, via that earlier example, versus the very inconsistent, whatever's in your prompt, that type of thing. Functions calling kind of falls in between there, in my mind, where for the most part, it's fairly consistent. But you do have some kind of weird stuff depending on what you put in there. So, long story short, function calling increased our number to 85%. We were very happy. And of course, 85 is still not very good. So, the question came up is, do we even need JSON? JSON is nice to use when you're parsing stuff out. But what if maybe there's some alternatives? And of course, we tried markdown. Markdown is pretty good. Depending on your use case, that may or may not be a benefit there. But really, just going back to simple parsing, line break parsing, that type of stuff, that seemed to work very well because there's less chance that the AI will actually mess stuff up. So, for what it's worth, that's kind of our go to option currently. And of course, giving example outputs, that's always a big benefit there.

8. Dealing with AI API and Response Handling

Short description:

We increased our number to 90%, but we're still not 100%. The AI API may return a 200 status but fail for various reasons, including bad format, excessively long responses, cut-off responses, or even blank strings. When using AI models in apps, consider implementing retry strategies and utilizing the 'end' argument to specify multiple versions of the response. However, this can result in increased token usage. To handle long responses, adjust API prompts to be more specific and provide examples. Sometimes, requesting a specific number of responses may yield unexpected results.

And of course, giving example outputs, that's always a big benefit there. So, with that said, we increased our number to 90%. We celebrate, but we're still not 100%. So, I want to just briefly talk about some of these other things that we've learned with dealing with the AI API here.

So, very first thing is, a lot of times the AI will return a 200 status, but it will fail for a number of reasons, including bad format. It'll maybe return a response that's like 20 times longer than you expected it to be. And of course, there's also the... It's returning a response. It just kind of cut mid-sentence and it gave up. And it's like, well, what the heck? What's going on there? And of course, one of my favorites, sometimes it just returns a blank string and you're just like, oh, that wasn't very helpful. So, whatever. That's cool.

So, ways to retry. So, very much when you're using some of these models in your apps, think about the retry strategies. In this case, exponential retry is typically I think that I would recommend there. A handful of libraries there for you to use if you want that. Another thing in the OpenAI's API is this end argument. So, this basically allows us to go and say, hey, if the first version that it returns is bad, you could actually specify, I want to return 20 versions of this, and with the same amount of latency, it'll actually return 20 different versions. So, if the first one is bad, go to the next one. If that's bad, go to the next one. And you have, in this case, 20 different chances that it'll be good, which ideally you wouldn't have to do. But we played around with that and actually worked in some cases pretty well there. So, we're okay with that. And of course, the downside there is we use up more tokens.

Covering the really long responses here. So, there's some additional API arguments in there. So, if you want to look up the max tokens, look up the stop. But ultimately, we found that in order to stop the runaway train, we want to adjust our API prompts to be very more specific, give examples of that thing. So, also, my favorite here is sometimes we say, hey, return five responses. I think my high score was you say, hey, I returned five responses and we got 42 responses back.

9. Counting Tokens and Handling Rate Limits

Short description:

ChatGPT models sometimes return more than expected. Tokens are used to count credits and have limitations. Counting tokens helps us avoid hitting limits. Other strategies include sending less context, using different versions of the model, chunking, and chaining requests. To avoid too many requests, check the headers for rate limits. With these strategies, we achieved a 99% success rate, accepting that errors are normal and models will improve over time.

And you're just like, I don't know if you know how to count. What's going on? So, for us, chatqbt models sometimes return a bit more than you expect to. So, another thing that we hit here is when the request was too large, I want to talk just very briefly about tokens. Anyone ever mess with tokens in here? AI tokens?

So, a handful of people. So, the way that they essentially count the credits that you're using there is via tokens. So, long story short, these tokens are part of, built into the model. They go down to embeddings, they call them, which is effectively a numerical representation of each of these tokens. And of course, roughly four characters in English is about one token. There's a bunch of details in there. I won't get into that too far. But just know that we started counting tokens because there are limitations when you use some of these models. They call it the context window there. And so, we were able to count our tokens before we actually hit the limits, that type of thing. And that seemed to work very well.

Other things to try, send up less context, fall back to the bigger context window versions of the model. So, there's like GPT-4, 16K, versus 32K, that type of thing. You could chunk things. And of course, chain requests, all those things help with essentially making sure that we are returning the responses we want. And then just want to briefly touch on this one. This one seems to happen with pretty much anyone I talked to that has used the OpenAI API. It's like, hey, there's too many requests. The key thing there is if you run into that, check the headers. And these headers right here will tell you essentially if you're hitting a rate limit or not. So, very helpful there. Took me a little bit to figure that one out. And so, long story short, with all of these things, we're able to get our percentage of success up to 99%. And we say 99 is great. There's a lot of additional things we could probably try, but apps throw errors all the time. That's perfectly fine, and we have some degree of errors in there. And so, we're okay with that, knowing, of course, that with time, the models are going to get better.

QnA

Discovering Limitations and Reactive Approach

Short description:

And ultimately, we're going to get to the 100%. We're happy with the results because a lot of cool things you do with AI. Todd, how did you bypass the CAPTCHAs? There are services out there that you could pay money to to have people manually bypass CAPTCHAs. This question asks, how did you discover the limitations like the CAPTCHA? We hit the API via the app, and then when an error happens, we go check the server. Was there a point where you just went, you know what? There's one too many reactive compromises here. Let's basically just can it for now. It served the purpose of helping us understand what we could do with it.

And ultimately, we're going to get to the 100%. We're just at the point where it's like, do we want to spend additional time trying to figure this out, or are we good where we're at? And so, that's essentially where we left off there and where we currently are at. And so, we're happy with the results because a lot of cool things you do with AI.

And so, with that said, thanks, everyone, for listening. And if you have questions, let me know.

Todd, how did you bypass the CAPTCHAs? Yep. So, there are... You knew it was coming, didn't you? So, hypothetically, there are services out there that you could pay money to to have people bypass CAPTCHAs, which, hypothetically, I'm pretty sure it's just some outfit, some company somewhere that has people at computers that are just manually pressing the CAPTCHA buttons, hypothetically. So, I would not endorse such a thing, but... Never. But that is a thing that's probably out there. So, good question. Yeah, but potentially, maybe in some world, it's possible.

Yeah, it's like theoretically possible. So, this question is kind of interesting. Towards the start of your talk, you spoke about that first experiment of building a chat GPT API. And this question asks, how did you discover those limitations like the CAPTCHA, like you gave some other examples? Yep. Or were you just having to be completely reactive to when it broke? It's basically... We hit the API via the app, and then when an error happens, we go check the server. And we see, oh, it's because this isn't working. And so, it was very much reactive of like, we didn't know this would be an issue. But it makes sense it would be an issue, and now we have to go and figure it out somehow, right?

Was there a point, and I think you kind of described it in your talk that there was. Was there a point where you just went, you know what? There's one too many reactive compromises here. Let's basically just can it for now. Yep. It served the purpose of helping us understand what we could do with it. But yeah, definitely not something we would push in front of users, because it just wasn't good. So... It did what it needed to. Yep.

Reliability and Challenges of AI APIs

Short description:

The API now supports JSON responses, but the format may not match the desired schema. The non-deterministic nature of AI APIs can lead to compromises in code and testing. In some cases, it may be necessary to fundamentally change the expected data structure. Custom instructions can be helpful in controlling the AI's output. Overall, while there are challenges, the AI is impressive and offers unique solutions.

It was more of a learning exercise, right? In your talk, you mentioned that the... I think it's the open AI API, not ChatGPT API, but the API now supports JSON responses. How reliable is the format of the response that is returned?

Yeah. So they did that support for format as JSON, essentially. The one thing to the caveat there is it'll return valid JSON, meaning if you JSON parse it, it will parse it successfully. However, it will not necessarily match whatever schema you want of whatever the shape of the JSON is. And so that you still have to go and do some prompt engineering and likely include, you know, one or two examples of what does valid JSON look like in the format that I wanted in. But yeah, definitely it helps with at least the base issue of you can't parse this at all.

Interesting. And then I suppose you can test against the structure of it then. Yep. Yep. There's this really interesting question here, which I wonder too, do the current state of these AI APIs actually cause us to compromise the way we would write code to handle the fact that they are so non-deterministic? And is that a problem? Yeah, I feel like that's a problem, but there's also kind of the hope for the future of like, eventually the models will get better. Eventually we'll have less compromises, but definitely the past year as I've been working on this stuff, it feels like there's a bunch of extra stuff where if you were just saying like a normal API, you would never have to do all these weird things. But because it's non-deterministic, because the model is very creative and given there are ways to kind of tweak that and make it less creative. In some cases you can't do that because of what you want back from the model. You basically have to kind of compromise and add additional effort in the code or in the test to verify that it's working or not type of thing.

Although you gave one example, which I thought was wild personally, where you're like, does it even need to be JSON? And that is just, for me, mind blowing. You'd be like, we are just going to fundamentally change the data structure, the data that we are expecting back because it can't be relied upon for JSON. And for some people, they would just say, all right, then it's a non-starter right now.

Yeah. And when you get desperate enough, you start asking those questions of, what are some other ways we could solve for this? Because once again, the AI is pretty awesome, but essentially taming it or controlling it sometimes is a bit harder, right? Have you found things like custom instructions help or still not really?

Yeah, definitely. In the prompt that we pass out, it's very custom to specifically what we want to get out of. So we have probably like 80 different prompts in our code that are unique to whatever we're trying to do there. And it's just like, yeah, very targeted use case and it seems to work out pretty good. So.

Interesting. Cool. This one, I think this is a meta question here. Did you use AI to write the code to call AI? Was it helpful in helping you get started? I feel like that's kind of two separate questions, but a meta question.

Using GPT and AI Copilot

Short description:

I have used GPT for Playground 3.52 and AI Copilot in my VS code. While AI Copilot is helpful for basic repetitive tasks, it falls short in more complex situations. Validation is necessary even for repetitive tasks. I have experienced issues with translating a conference schedule into multiple time zones, resulting in incorrect times for speakers. Despite validating a portion of the schedule, it turned out to be a mess.

So very much I have used GPT for, I guess, Playground 3.52 in writing some code, based on the earlier conversation with the previous session, very much that there's some things you would want to improve there. I currently do use AI Copilot in my VS code. It seems to be very helpful, probably 50% of the time you could trust it, but the other times it's not very good. So my current best thinking there is overall it helps with a lot of little things. And as long as it's doing very basic, repetitive things like, hey, declare this five different times based on this other variable, things like that, it does very good job at, but other things when it's a little bit more involved, not very much. And that's where you debate based on the previous conversation of, is it worth, is it really saving me time? Yes or no. It's kind of a mixed bag, but hope for the future it'll be better. And so let's start using it. Let's start giving feedback and hopefully things will improve. Even for those repetitive tasks, still validation is necessary. Come and ask me during lunch about translating a conference schedule into three time zones. And as the day went on, the times became wrong. And I invited speakers at the wrong hours of the day. Sometimes not even the wrong hours, the wrong minutes. It was just, it made it up as time went on. And I validated the first 20% and went, looks good. Wasn't enough, absolute mess. So I can tell you that story and more to help though over lunch.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building a Voice-Enabled AI Assistant With Javascript

JSNation 2023

21 min

Building a Voice-Enabled AI Assistant With Javascript

Top Content

Tejas Kumar

Author of the "Fluent React" bestselling book, software engineer with 23 years of experience, and host of the developer-loved ConTejas Code podcast.

This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.

case study artificial intelligence

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

JSNation US 2024

31 min

The Ai-Assisted Developer Workflow: Build Faster and Smarter Today

Top Content

Addy Osmani

Engineering Leader Working on Google Chrome

AI is transforming software engineering by using agents to help with coding. Agents can autonomously complete tasks and make decisions based on data. Collaborative AI and automation are opening new possibilities in code generation. Bolt is a powerful tool for troubleshooting, bug fixing, and authentication. Code generation tools like Copilot and Cursor provide support for selecting models and codebase awareness. Cline is a useful extension for website inspection and testing. Guidelines for coding with agents include defining requirements, choosing the right model, and frequent testing. Clear and concise instructions are crucial in AI-generated code. Experienced engineers are still necessary in understanding architecture and problem-solving. Energy consumption insights and sustainability are discussed in the Talk.

artificial intelligence

AI and Web Development: Hype or Reality

JSNation 2023

24 min

AI and Web Development: Hype or Reality

Top Content

Wes Bos

Full Stack Developer, Speaker & Teacher, Co-host of Syntax.fm podcast.

This talk explores the use of AI in web development, including tools like GitHub Copilot and Fig for CLI commands. AI can generate boilerplate code, provide context-aware solutions, and generate dummy data. It can also assist with CSS selectors and regexes, and be integrated into applications. AI is used to enhance the podcast experience by transcribing episodes and providing JSON data. The talk also discusses formatting AI output, crafting requests, and analyzing embeddings for similarity.

productivity artificial intelligence

The Rise of the AI Engineer

React Summit US 2023

30 min

The Rise of the AI Engineer

Top Content

Watch video: The Rise of the AI Engineer

Shawn Swyx Wang

Latent.Space Editor & Smol.ai Founder

The rise of AI engineers is driven by the demand for AI and the emergence of ML research and engineering organizations. Start-ups are leveraging AI through APIs, resulting in a time-to-market advantage. The future of AI engineering holds promising results, with a focus on AI UX and the role of AI agents. Equity in AI and the central problems of AI engineering require collective efforts to address. The day-to-day life of an AI engineer involves working on products or infrastructure and dealing with specialties and tools specific to the field.

web development artificial intelligence builders and founders future of development

Web Apps of the Future With Web AI

JSNation 2024

32 min

Web Apps of the Future With Web AI

Jason Mayes

Web AI Lead at Google.

Web AI in JavaScript allows for running machine learning models client-side in a web browser, offering advantages such as privacy, offline capabilities, low latency, and cost savings. Various AI models can be used for tasks like background blur, text toxicity detection, 3D data extraction, face mesh recognition, hand tracking, pose detection, and body segmentation. JavaScript libraries like MediaPipe LLM inference API and Visual Blocks facilitate the use of AI models. Web AI is in its early stages but has the potential to revolutionize web experiences and improve accessibility.

artificial intelligence

Code coverage with AI

TestJS Summit 2023

8 min

Code coverage with AI

Premium

Jaap Brasser

Codium

Codium is a generative AI assistant for software development that offers code explanation, test generation, and collaboration features. It can generate tests for a GraphQL API in VS Code, improve code coverage, and even document tests. Codium allows analyzing specific code lines, generating tests based on existing ones, and answering code-related questions. It can also provide suggestions for code improvement, help with code refactoring, and assist with writing commit messages.

artificial intelligence

Workshops on related topic

AI on Demand: Serverless AI

DevOps.js Conf 2024

163 min

AI on Demand: Serverless AI

Top Content

Featured WorkshopFree

Nathan Disidore

In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.

serverless architecture artificial intelligence

AI for React Developers

React Advanced 2024

142 min

AI for React Developers

Top Content

Featured Workshop

Eve Porcello

Knowledge of AI tooling is critical for future-proofing the careers of React developers, and the Vercel suite of AI tools is an approachable on-ramp. In this course, we’ll take a closer look at the Vercel AI SDK and how this can help React developers build streaming interfaces with JavaScript and Next.js. We’ll also incorporate additional 3rd party APIs to build and deploy a music visualization app.
Topics:- Creating a React Project with Next.js- Choosing a LLM- Customizing Streaming Interfaces- Building Routes- Creating and Generating Components - Using Hooks (useChat, useCompletion, useActions, etc)

react next.js artificial intelligence

Building Full Stack Apps With Cursor

JSNation 2025

46 min

Building Full Stack Apps With Cursor

Featured Workshop

Mike Mikula

In this workshop I’ll cover a repeatable process on how to spin up full stack apps in Cursor. Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development. We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more! By the end expect to be able to run your own AI generated full stack app on your machine!
Please, find the FAQ here

artificial intelligence

Vibe coding with Cline

JSNation 2025

64 min

Vibe coding with Cline

Featured Workshop

Nik Pash

The way we write code is fundamentally changing. Instead of getting stuck in nested loops and implementation details, imagine focusing purely on architecture and creative problem-solving while your AI pair programmer handles the execution. In this hands-on workshop, I'll show you how to leverage Cline (an autonomous coding agent that recently hit 1M VS Code downloads) to dramatically accelerate your development workflow through a practice we call "vibe coding" - where humans focus on high-level thinking and AI handles the implementation.You'll discover:The fundamental principles of "vibe coding" and how it differs from traditional developmentHow to architect solutions at a high level and have AI implement them accuratelyLive demo: Building a production-grade caching system in Go that saved us $500/weekTechniques for using AI to understand complex codebases in minutes instead of hoursBest practices for prompting AI agents to get exactly the code you wantCommon pitfalls to avoid when working with AI coding assistantsStrategies for using AI to accelerate learning and reduce dependency on senior engineersHow to effectively combine human creativity with AI implementation capabilitiesWhether you're a junior developer looking to accelerate your learning or a senior engineer wanting to optimize your workflow, you'll leave this workshop with practical experience in AI-assisted development that you can immediately apply to your projects. Through live coding demos and hands-on exercises, you'll learn how to leverage Cline to write better code faster while focusing on what matters - solving real problems.

artificial intelligence

Free webinar: Building Full Stack Apps With Cursor

Productivity Conf for Devs and Tech Leaders

71 min

Free webinar: Building Full Stack Apps With Cursor

Top Content

WorkshopFree

Mike Mikula

In this webinar I’ll cover a repeatable process on how to spin up full stack apps in Cursor. Expect to understand techniques such as using GPT to create product requirements, database schemas, roadmaps and using those in notes to generate checklists to guide app development. We will dive further in on how to fix hallucinations/ errors that occur, useful prompts to make your app look and feel modern, approaches to get every layer wired up and more! By the end expect to be able to run your own ai generated full stack app on your machine!

fullstack artificial intelligence

Working With OpenAI and Prompt Engineering for React Developers

React Advanced 2023

98 min

Working With OpenAI and Prompt Engineering for React Developers

Top Content

Workshop

Richard Moss

In this workshop we'll take a tour of applied AI from the perspective of front end developers, zooming in on the emerging best practices when it comes to working with LLMs to build great products. This workshop is based on learnings from working with the OpenAI API from its debut last November to build out a working MVP which became PowerModeAI (A customer facing ideation and slide creation tool).
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps

artificial intelligence openai react and ai