Video Summary and Transcription
This Talk discusses the use of AI in API testing and provides a step-by-step strategy for incorporating artificial intelligence with chat.dpt. It emphasizes the importance of analyzing documentation and creating test cases using tools like Swagger and Cypress. The Talk also addresses the role of human involvement in testing, the balance between manual work and AI assistance, and the need for validation of AI-generated tests. Overall, AI can significantly speed up the testing process, but human analysis and vigilance are still necessary for accurate results.
1. Introduction to AI and API Testing
Hi, everyone! Today, I'll share a step-by-step strategy for API testing and discuss how incorporating artificial intelligence with chat.dpt can elevate your testing process. Let's dive into the heart of the matter. API is an important layer in the application, and the chat can help us delegate monotonous work to AI. I joined Spleeky as the only QA and had to start everything from scratch. The chat was a popular tool, so I decided to experiment and give it a try. Let's focus on versions 3.5 and 4.2.0, as they have their pros and cons.
Hi, everyone, and thank you for joining me today. My name is Olga, and I am honored to be your guide in the intersection between artificial intelligence and API testing. So I believe that you will learn new tips and approach today.
Today, I'll share with you a strategy for API testing, step by step, and discuss why incorporating artificial intelligence, particularly with chat.dpt, can elevate your testing process to a new level. So in today's examples, we will learn and we will see how to use the chat for REST API and GraphQL.
But first, a bit about myself. My current position is QA manager at Spleeky. For most of my life, I worked with automation, built processes from scratch, and tried different test frameworks. Also, I'm a huge fan of life improvement. I love mountain climbing and so on. For example, last month I climbed on a height of 3,000 meters and the Alps. But at work, I love to ease, but not to complicate. So now let's dive into the heart of the matter.
API is one of the important layers in the application. It's very easy to understand why. It's important to cover. If we have a look at my conspire pyramids, here API tests are set at the integration level, which is supposed to be the second batch of our tests. So how can the chat help us at this stage? The main point is we can delegate monotonous and repetitive work to AI. Let me illustrate how things used to look in my company. I joined Spleeky this summer, and I was the only QA at the project. The team tested features on their own, but there were no QA or QC processes or test documentation, so I had to start everything from scratch. We had unit tests, but we also needed end-to-end tests and covered endpoints with API tests. At the time, we had 30 endpoints in REST and 20 in GraphQL. When you start from scratch, you are usually pressed for time. And I was looking for a popular and convenient tool for boosting my work. The chat was on everyone's lips, so I just decided to experiment and give it a try to find out whether it was worth it or not. And the chat has two versions, 3.5 and 4, but they also released version 4.2.0, but let's focus today just on the first two. Of course, all these have their pros and cons. Version 3.5 is free, but it takes just text. News makes mistakes.
2. Testing Strategy and Documentation Analysis
The last update was in January 2022, and version 4.0 requires payment for premium features. However, it supports various file formats and can generate images. The testing strategy involves checking specifications and performing various steps, such as verifying status codes, payload, headers, and basic performance. Practice begins with analyzing the documentation using specific steps.
And the last update was in January 2022, which means that it doesn't have access to the newest information. And as for version 4.0, to unlock the premium features, you need to pay for it. That's the bad news.
But it takes not only text, but other files formats such as PDFs, tables, images, audio, videos, and archives. What's more, it can generate images itself. And as for the knowledge update, the last time was in April 2023. That means that this version is more relevant.
Now I'm going to go into the testing strategy. The testing strategy consists of two steps. First and foremost, we need to check specifications. We always need to start from this step. This is also important to be sure that endpoints are named correctly, that resources and their types depict the correct model, and there is no lack of duplicated functionality.
Then comes to testing itself. As for the testing, it can be broken down into several steps. Firstly, it's necessary to check the actuality of the status code. When you send, for example, a post request and create new item, you should get 201 status code. If we send a request which is forbidden, we expect 403 status code. Then check the payload. Check that the body JSON names, types, and fields are correct. Don't forget about requests with an error response. The third thing you need is check the headers of response. Headers are critical because they affect security and performance. The last step you need to do is check basic performance. In case the operation was a success but took a lot of time, the test is still considered failed.
Now, it's time for practice. Before I start, please keep in mind that it's not safe to share sensitive data. Always clear it. Let's start with the first stage, documentation. So create a prompt and ask the chat to analyse the documentation. For this purpose, we can use several steps.
3. Creating Test Cases and Automation
You can manually do it or use an external tool like prompt perfect. It offers a more detailed and structured result. We use Swagger for documentation. Two options are available: taking a snapshot or copying and pasting the text. Test cases are created and automated using prompts. In just a few seconds, it generates tests with assertions and can provide a test description. An example of a positive scenario with request data is shown, written by Cypress.
For example, you can do it manually or use an external tool. You can create it with prompt perfect, for example. So here is our original prompt. I hope you see it well. And as you can notice, there is nothing special. It's just a simple phrase. And this is what prompt perfect offers. Now it's more detailed, better structured, and a specific result is required.
Then go to our documentation. In this particular case, I will use Swagger. We can act here also in two ways. You can take a snapshot for it or you can copy and paste the text. I prefer the second option more, honestly. Now we get the final result. Don't get into details. Just look at the slides. So it represents language use. It represents content evaluation, features or aspects, and other things.
So what we will do next is create test cases and automize them. It's not a surprise that we need a prompt for these two. And in a few seconds, we get 13 checks. Of course, it's more checklist than the normal test cases. Still, based on them, we can make full test scenarios. So we analyzed documentation, generated test cases. What's the next? It's time for automation. And we send the new prompt. And now again, in just a few seconds, it generates the test with assertions. And by your desire, it can also generate a test description in the end. So here, you can see an example of a positive scenario with request data, and it's written by Cypress. So it can also be the same for GraphQL, but a few words about it.
4. GraphQL and the Role of Human Involvement
GraphQL is a query language introduced by Facebook to manage REST-based APIs. In Cypress, there is no difference between testing UI and API. CodePilot is a popular tool that can be used directly in your IDE. While these tools are helpful, double-checking and improving tests is necessary. The code can have errors, and the chat may misinterpret test requests. Human involvement is still crucial, especially for big data testing.
GraphQL is a query language. It was introduced by Facebook and designed to make it easier to manage endpoints for REST-based APIs. So now, you know the best formula for well-generated test. It's a correct prompt and valid data. And in Cypress, there is no difference between REST and GraphQL. When you create tests, you do it just for UI or for API. And here is our final result.
And of course, the chat is not the only existing resource. It's just a popular one. And you can choose any of these tools by your preference. For example, in recent days, CodePilot gained popularity, and the huge advantage of it is that you can use CodePilot directly in your IDE. And all these examples are impressive.
And perhaps you have a reasonable question. Can I really ease my work so far? And the answer is yes and no. And though all these tools are very helpful and could make our daily working routine easy, you will always need to double-check. And at the first, you always need to double-check yourself. You can't use sensitive data at all. And also, the test cases are not complete. And from my perspective, they don't look professional. And it has not all the checks. And we still need to improve tests and add new ones. And the code sometimes has errors. And as an example, if we ask to generate a test checking the successful response status, the chat interpreters, it's literally, so it's not correct. Other example is the function of randomizing the name. From the first point of view, it looks okay. But if we insert this piece of code in IDE, we will find out that the variable is not initialized. So all these activities still need human. And I believe it can be a good assistant, but not a good tester. For big data, it takes more time than we're expecting. And as you see, it also makes mistakes.
Using AI and Addressing Developer Skepticism
And for example, if you want to make a complex task, you need to split it and so on. Thank you for your attention. One question related to your talk is about GPT 3.5.4 and 4.0 Turbo. Have you tried Turbo yet? Honestly, not yet. But I pay for the 4.0 version. If you're just starting, try the free version and explore different tools. Another question is about being vigilant with the output of AI. My advice is that these people need testers and knowledge to understand what they are testing. There will continue to be a challenge for less experienced developers. More questions are coming in about setup time for using ChatGPT with Prompt Perfect.
And for example, if you want to make a complex task, you need to split it and so on and so further. I hope it helps. So thank you for your attention. And feel free to ask me questions if you have any.
So I think that was a really good primer for how to use AI to support the work that we do. There's one question that's come in so far related to your talk. And well, there's a couple. You mentioned just at the very start, GPT 3.5.4 and 4.0 Turbo. Have you managed to try out Turbo yet? Have you found it performs better in these tasks, perhaps? Honestly, no, not yet. But I try the 4.0 version. I pay for it. So I invest in my work. And for me, it's more... Of course, it's more useful because it takes, as I noticed, and as I said, it takes different file formats, as it could help you very much. But still, if you just begin your work and if you're just trying to get acquainted with this kind of tool, it's necessary to try the free version. And maybe you can try different tools, not exactly at GPT.
Interesting. We have some questions in. Please do continue to keep them coming. One question I have, one of my personal skepticisms of AI for developers in any area of our work is that one thing you were able to do was validate that a response was, you know, that something that it returned was incorrect. And I think that your ability to do that comes with the experience of being a developer. At the same time, there's a whole slew of less experienced junior developers, people breaking into the field, who have unrealistic expectations of what these tools will do for them. They'll take something verbatim and go, yeah, it looks right, let's try that. Do you have any advice, techniques, tools for these people to be more vigilant with the output of AI? I believe these people need testers. Because as I noticed and as I mentioned, you can use AI, but you still need knowledge to understand and to be sure what you are really testing right now. Yeah. I think this will continue to be a challenge for... Also, project maintainers as well, on the other end of people thinking they've written valid code, but they've taken hallucinogenic code. Great, we've got some more questions in. During your talk, you mentioned that using tools like ChatGPT with the aid of things like Prompt Perfect reduce your overall time, but it feels like there's a lot more setup time, perhaps, in order to make it work.
Balance between Manual Work and AI Assistance
Do you think that the balance between manual work and AI assistance in test generation is correct? While extra work is needed to set up AI, it can significantly speed up the process, especially when generating a large number of tests. However, relying solely on AI may not be reliable, as it heavily depends on the quality of the initial specification documentation. Incomplete or incorrect documentation can lead to faulty tests. Therefore, human involvement and analysis of the documentation are still necessary.
Do you think that that balance is correct? Here it says, like, is it really that helpful? But where, at what point does that balance make sense? That's a really good question.
Yeah, of course, you need extra work to make your main work. But just imagine that you need to generate 100 tests in just one day, and you can try it with AI. You can try it with, for example, chat. Because if you will do it manually, you still need more time than just to set it up in AI and use it for this purpose. Yeah.
I wonder if there's also something interesting there about what we have, businesses' expectations of developer productivity become far too great with the state of the tools today. And it sounds like that might be the case. If you were confident of the output, then yeah, loads of productivity gain. But based on your talk, you're not. And I'm certainly not. And I don't think many of us should be interesting, cool.
Have you ever tried this process to do unit testing for React components? And if you did, or components in general, how did that go? No, I didn't have such experience, but I believe it will be the next challenge in tasking my work.
Okay. Okay. So, there's a couple of similar questions here. The first one we see here and the third one we see here. I'm going to highlight the third one because it's a little more fleshed out. Nope, it moved under my thumb. It's this one here. The success of AI in generating tests or interpreting code or writing any code is pretty... It feels like it relies heavily on the initial specification documentation being good or near perfect or completely perfect. Can complete or imperfect spec docs lead to the creation of faulty tests? I can say that incomplete or the wrong documentation could be the problem in our product development. And that's also the problem with AI. For example, before analyzing it through these tools, you need to do human searching before it. So you need to analyze this documentation itself because you can't delegate totally this work to AI. Just an example. Okay. It will analyze, but you won't be in the context later. So I think that's why we still need to be involved in this process.
Use Cases and Handling Sensitive Data
I've wondered what happens if the base documentation hasn't been properly maintained and we get wrong results. The common use cases for CoPilot and chat GPT include test cases and checklists. We need to be cautious about not passing sensitive data like usernames and passwords to these systems. Similar vigilance is required as when passing data to CICD workflows.
Yeah. I'm vigilant. I've wondered in the past how, you know, there are these like chat GPT style interfaces starting to crop up in developer documentation. I always worry what happens if that base documentation hasn't been properly maintained. As a user, I don't necessarily know that straight away this is not very well maintained docs. And then we just get results which are wrong. And that's a huge challenge.
So beyond writing tests, I suppose, what are the most common use cases you see for things like CoPilot, chat GPT or other GPT systems? So as for CoPilot, for example, last time I was wondering if it knows about the main gap or issue in Cypress. And I asked about this, and CoPilot told me that there is a problem for mobile testing. We can't test mobile application. That's true, of course. It's clear and it's obvious. And as for the common use cases, it's just you can try it for test cases, as I mentioned, and you can try it for checklist and these simple work that can be easily delegated. I think it's that thing of easy delegation and, you know, reasonably, I don't quite know the right way to phrase this, where you can not necessarily have a huge, huge, huge amount of trust on a first pass and that isn't absolutely detrimental.
This question has a huge number of thumbs ups, regarding sensitive data. How do we prevent not passing sensitive data without our knowledge? Because I imagine this happens sometimes or in simple terms, how can we be more cautious around not giving our sensitive data to these systems? When I mentioned sensitive data, I mean that it's more about username. It's more about passwords. And it's of course, it's not about company secrets. It's just about really sensitive data of users of our product. And this is the same when you write in your test and you get rid of these important and sensitive credentials. You just make it through, I don't know, gitignore file or other things when you set up your for tests. And this is the same when you set up these AI tools for your help, you need to be sure that there is no this sensitive data like I mentioned, passwords, logins, so on. I imagine that's very similar to just passing data to CICD workflows. You want to make sure to emit data. It's probably very similar amount of vigilance that you would need to undertake.
How do... Let me read this question. How does somehow separate... So yes. So when it comes to actually spitting out these tests, great.
Separating AI-Generated Tests
When tests are generated by AI, they can appear amazing, with detailed names and noticeable structures. However, even when implementing AI-generated tests, manual work is still required to establish guidelines, create titles, and handle additional components. There is a degree of vigilance needed when using AI-generated tools, as tests may contain errors. The time trade-off between AI-generated tests and manual writing depends on the complexity of the task. For simpler components, manual testing may be sufficient, but for more complex scenarios, AI can provide valuable assistance.
I'm setting them up. I'm setting up my test files. Do you or should you, or how could you separate or should you bother at all separating tests that are written by an AI, a machine versus those that are actually handcrafted and written by developers manually?
It is very noticeable when it's generated by AI because we still have human factor. I'm not sure are you a good tester for example or not, but when you generate test by yourself, you can miss something or the title of this test is not... Seems like it's not technical. It's just the name of test case and it is located in your test. But when it generated by AI, it looks amazing. You know, it's wow. The name of this test is really, really detailed and the structure of this test is also noticeable.
And as for separation between one from the second part, even if you are going to implement, and I believe you will implement this test generated by AI into your test framework, you will still have some manual work just to have a developing guideline. You need to create your own, for example, titles or you need to create variables. And for example, someone support components page like for another components, and you can also do this kind of additional work.
I have a question going back to something you said earlier and part of your talk as well. So it's generated this test, the test is incorrect. There's some error with it. And so there is still a degree of vigilance that you have to have from AI generated tools before you just go ahead and take those tests and implement them. And I wonder for a lot of test writing is a lot of the... Even the examples you gave, they weren't necessarily very advanced examples, but writing lots of them is quite menial. I wonder, do you have any thoughts around the time trade-off between having AI generate something that could or could not be wrong and needing to review it and validate it versus just writing it manually in the first place?
It depends on the complexity. If it's just checking the components, maybe it's not a good example with API, but I will switch to UI side. So if we just need to check components, not try the whole end-to-end test when you connect to the database and make some actions before running your tests. When you just need to test the, I don't know, login form, it is enough. But if you have some very difficult and deep things, you can use it like a help. Of course, it will ease your work. But still, as an experienced tester, it not takes a lot of time to write it by yourself, but it will be easy for you to do it. I hope it's clear. Yeah. I just... It's this trade-off in my mind that isn't quite sitting. I'm not sure if I'm just going around in circles and others are feeling the same or not feeling the same.
Validating AI-Generated Tests
By validating the output of AI, I gained more confidence in just writing the tests. I can generate around 50 tests per day using AI, but manual validation is still necessary. The time saved by AI-generated tests is significantly different for smaller and larger numbers. On average, this approach can save at least two days of work compared to starting from scratch with manual testing.
But by the time I validated the potentially hallucinated output of AI, for many things, I would feel more okay, more confident just writing it. Also the example you gave, and in fact, another question, which I'll ask as a lead into mine. Oh, no, I accidentally hit tick. The question was like, how many tests have you generated this way? Are we talking dozens, hundreds, thousands? If it's on that larger number, what is your strategy for validating an output before accepting a test?
Yeah, in case I pressed by time and I currently don't have this situation anymore, but when it was, I can generate around 50 tests by one day. And of course, if you do it manually, you can copy and paste and change some data there, but you also need to change assertions. You also need to double check it. So 50 tests, it was my maximum. Interesting. And at that point, you can still do some manual validation. Of course. Because there are small enough numbers that you can take a look. Some of it is just manual. Some of it will be throwing it in a code editor, making sure it is asserting correctly. But, yeah, that's interesting. But then the time save for 50 and the time saved for 1,000 are just so, so different. That's super interesting.
We have time for maybe one more super quick question. And it is actually that question. I hit tick again instead of the thing that brings it up, but I do remember the question. I'm so sorry, everyone. The question was, how much time do you reckon this has actually saved you in practice on any given project, any given bout of writing tests? In the very beginning, you can save at least two days. It's a minimum that you can save. Yeah. Excellent. Cool. I'd be interested... Versus what. Like, can you give me an idea? So where were you at the start? Was there just no tests written and this is just let's build the basics here? Or was it a bit more context would be really helpful? Yeah. So if I started from scratch, it was nothing. And you can save your time. You can do this work in the two days versus seven days, for example. It was my personal counting. So yeah. Interesting. Cool. Thank you ever so much. That was fascinating. I really appreciated having a little bit longer to ask questions. A huge round of applause for Olga, please.
Comments