Three Ways to Automate Your Browser, and Why We Are Adding a Fourth: WebDriver BiDi

Rate this content
Bookmark

A journey through overwhelming ways to automate browsers. Join Michael on a journey to see what happens behind the scenes of "await page.goto('https://example.com');" et. al. See what pros and cons each of the three ways of browser automation have.


Understand why we are adding a fourth - WebDriver BiDi.

This talk has been presented at JSNation 2023, check out the latest edition of this JavaScript Conference.

FAQ

Michael Hablich is a product manager on the Chrome team, focusing on reducing friction in testing and debugging web applications. He has around 20 years of experience in tech, particularly in building test automation solutions for enterprises.

Quality assurance and testing activities take up a significant portion of software development costs. QA is essential to ensure applications work correctly, as failing to properly test can lead to user issues and potential risks. Test automation helps reduce the ongoing costs of testing.

The primary use cases of browser automation technologies are test automation, web scraping, and rendering parts of pages like ads. The focus of Michael Hablich's discussion is on test automation.

In the 90s, browser automation was done via native APIs. For example, Visual Basic 6 was used to automate Internet Explorer. Manual testing or injecting scripts were common methods for automating Java applets and Flash containers.

With the rise of rich and interactive web experiences and the need for cross-browser and cross-device compatibility, the need for effective test automation increased. Selenium and the WebDriver project were created to address these test automation challenges.

Michael Hablich discusses three major browser automation protocols: WebDriver Protocol, Chrome DevTools Protocol (CDP), and Web APIs plus browser extensions. He also introduces WebDriver Bidirectional (Bidi), which combines the benefits of WebDriver and CDP.

The Chrome DevTools Protocol (CDP) is designed to enable Chrome DevTools to debug web pages. It communicates directly with Chromium-based browsers via WebSockets and is used by tools like Puppeteer for browser automation.

WebDriver Bidirectional (Bidi) is a new standard for browser automation that combines the benefits of WebDriver and CDP. It features bi-directional messaging, low-level controls, cross-browser support, and standardization, specifically built for testing.

WebDriver Bidirectional (Bidi) offers bi-directional messaging, low-level controls, cross-browser support, and standardization. It combines the fast and powerful features of CDP with the cross-browser support and standardization of WebDriver.

Yes, parts of WebDriver Bidirectional (Bidi) are already being shipped incrementally, and automation libraries like Selenium, WebDriver IO, and Puppeteer have landed initial Bidi support. However, the WebDriver Bidi protocol is still a work in progress.

Michael Hablich
Michael Hablich
19 min
05 Jun, 2023

Comments

Sign in or register to post your comment.
Video Summary and Transcription
This Talk discusses browser automation techniques, including the introduction of a new web driver. It covers the history of browser automation, different techniques for automating browsers, and the use of web APIs and browser extensions. The Talk also explains how automation tools communicate with browser drivers and the challenges of waiting for elements to appear on the screen. It highlights the differences between the WebDriver protocol and the Chrome DevTools protocol, and introduces the WebDriver Bidirection project that aims to combine the best parts of both protocols. Lastly, it mentions the WebDriver Bidi support for console monitoring and introduces WebDriver ByteEye as a stable automation choice.

1. Introduction to Browser Automation

Short description:

I'm Michael Hablich, a product manager on the Chrome team, working on reducing friction of testing and debugging web applications. Today, I'll talk about browser automation techniques and why we're adding a fourth one, web driver. Quality assurance and testing activities take up a big chunk of the software development cost, and test automation is a very good way to reduce the continuous costs of testing. Browser automation automates user interactions and pretends to be a user, with typical use cases including test automation, web scraping, and rendering part of pages like ads. Let's take a short tour of the history of browser automation, from the native APIs in the 90s to the complexities of Java applets and Flash in the 2000s.

Hi folks. I'm Michael Hablich, a product manager on the Chrome team. There, I'm working on reducing friction of testing and debugging web applications. I have the honor today to talk about browser automation techniques and why we're adding a fourth one, web driver by the way.

I spent around 20 years working in Tech already. A big chunk of this is building test automation solutions for enterprises. One can say I had a lot of fun automating browsers, .NET applications, and more niche technologies like Power Builder.

So, why I'm here? Well, the Chrome team periodically reviews the satisfaction of web developers and, surprise, testing, in particular, across browsers, is a top pain point for web developers. Quality assurance and testing activities take up a very big chunk of the software development cost, and you can't simply cut them away. QA is necessary because either your testing applications or your users are filled, and the latter has some risk attached to it. And test automation is a very good way to reduce the continuous costs of testing.

So, let's first define a bit what browser automation is about, and briefly skim how it works. Simplified browser automation automates user interactions, and pretends to the browser to be a user. Often such interactions are stored as source code, as seen on the left side. These interactions are then replayed, as you can see on the right side. Typical use cases of browser automation technologies are test automation, web scraping, or rendering part of pages like ads. Today, I'm focused on the first test automation. The previous slides showed the current state of browser automation, test defined in JSON and JavaScript. Fast and stable automation, and so on. Before we ended up in such a cozy place, a lot of history is happened. Let's take a short tour. The web was born in the 90s. People started using browsers in a limited set of big screen Testing in these decades is mostly done against data content. Browsers like Netscape Navigator or Internet Explorer were shipped. Browser automation at that time was done via native APIs. For example, I can still remember using Visual Basic 6 to automate Internet Explorer. In 1996, Java applets and Flash became a thing. They made automating webpages even more complicated because the browser automation APIs provided by the browser vendors did not work for Java apps and Flash containers. Manual testing or injecting scripts were the way to go for these technologies. In the 2000s, more browsers were joining the scenes, including Chrome.

2. Browser Automation Techniques

Short description:

Developers started building rich and interactive web experiences. Selenium and WebDriver were created to address test automation challenges, with WebDriver becoming a W3C standard. Multiple JavaScript testing libraries were introduced, using different techniques to automate browsers. We'll cover the WebDriver Protocol, Chrome DevTools Protocol, and Web APIs plus browser extensions. There are two major categories: high level, executing injected JavaScript, and low level, executing remote commands. Let's focus on the approach of using web APIs and browser extensions to build an automation layer.

Developers started to build very rich and interactive experiences on the web. YouTube and Google Map are some very good, early examples of this. With smartphones coming into the picture, needs for test automation increased because suddenly there was a requirement for cross-browser and cross-device compatibility. Selenium and the WebDriver project were born to solve the test automation challenges.

At that time it was common to write Selenium tests in Java. In 2009, Node.js brought JavaScript development to the backend. Also, it enabled running tests written in JavaScript. More JavaScript frameworks came into the picture. At the same time, Selenium and WebDriver merged into a single Selenium-WebDriver project. With the growing popularity, the project became a W3C standard in 2018, and we call it WebDriver Classic.

With more developers building richer applications in JavaScript, these developers also wanted to perform test automation in JavaScript as well. Multiple web-based JavaScript testing libraries are introduced to address the needs, and not all of them use WebDriver as the underlying automation technology. They are using different techniques to automate the browser, which we are going to talk about today. We will cover the WebDriver Protocol, supported by solutions like Selenium, Nightwatch.js, or WebDriverIO, the Chrome DevTools Protocol, CDP in short, powering Puppeteer, Chrome's own automation library, and PlayWrite, and Web APIs plus browser extensions, leveraged by Taskcafe or Cypress, for example.

Let's start and take a step back and talk about how tools automate browsers. I mentioned three major ways to automate a browser. Well, they fall into two major categories, too. Let's intensify the complexity a bit, because we have high level, which executes JavaScript injected into the browser, and low level, which executes remote commands. For example, Cypress utilizes browser extensions and Node.js to execute a test directly in the browser. To gain greater control of the browser, like opening multiple tabs, and testing for party iframes, we need to go deeper and execute remote commands. With other techniques, and let's call it simply protocols. The two common protocols are WebDriver, Chrome, and DevTools protocol, Cpp in short. We will explore all of this together shortly. No worries. I'm going to start with the approach to use web APIs and browser extensions to build your own automation layer. Essentially the solutions leverage and launch of web APIs, JS injection, browser extensions, proxies, etc., to build their very own automation layer. Going into detail here would burst the talk, size of the talk. So I'm going to stop here and segue over to WebDriver, the automation technique built upon standard. It's one of the low level protocols. So let's take a brief look how they work in principle.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Network Requests with Cypress
TestJS Summit 2021TestJS Summit 2021
33 min
Network Requests with Cypress
Top Content
Cecilia Martinez, a technical account manager at Cypress, discusses network requests in Cypress and demonstrates commands like cydot request and SCI.INTERCEPT. She also explains dynamic matching and aliasing, network stubbing, and the pros and cons of using real server responses versus stubbing. The talk covers logging request responses, testing front-end and backend API, handling list length and DOM traversal, lazy loading, and provides resources for beginners to learn Cypress.
Testing Pyramid Makes Little Sense, What We Can Use Instead
TestJS Summit 2021TestJS Summit 2021
38 min
Testing Pyramid Makes Little Sense, What We Can Use Instead
Top Content
Featured Video
Gleb Bahmutov
Roman Sandler
2 authors
The testing pyramid - the canonical shape of tests that defined what types of tests we need to write to make sure the app works - is ... obsolete. In this presentation, Roman Sandler and Gleb Bahmutov argue what the testing shape works better for today's web applications.
The Future of Performance Tooling
JSNation 2022JSNation 2022
21 min
The Future of Performance Tooling
Top Content
Today's Talk discusses the future of performance tooling, focusing on user-centric, actionable, and contextual approaches. The introduction highlights Adi Osmani's expertise in performance tools and his passion for DevTools features. The Talk explores the integration of user flows into DevTools and Lighthouse, enabling performance measurement and optimization. It also showcases the import/export feature for user flows and the collaboration potential with Lighthouse. The Talk further delves into the use of flows with other tools like web page test and Cypress, offering cross-browser testing capabilities. The actionable aspect emphasizes the importance of metrics like Interaction to Next Paint and Total Blocking Time, as well as the improvements in Lighthouse and performance debugging tools. Lastly, the Talk emphasizes the iterative nature of performance improvement and the user-centric, actionable, and contextual future of performance tooling.
Full-Circle Testing With Cypress
TestJS Summit 2022TestJS Summit 2022
27 min
Full-Circle Testing With Cypress
Top Content
Cypress is a powerful tool for end-to-end testing and API testing. It provides instant feedback on test errors and allows tests to be run inside the browser. Cypress enables testing at both the application and network layers, making it easier to reach different edge cases. With features like AppActions and component testing, Cypress allows for comprehensive testing of individual components and the entire application. Join the workshops to learn more about full circle testing with Cypress.
Test Effective Development
TestJS Summit 2021TestJS Summit 2021
31 min
Test Effective Development
Top Content
This Talk introduces Test Effective Development, a new approach to testing that aims to make companies more cost-effective. The speaker shares their personal journey of improving code quality and reducing bugs through smarter testing strategies. They discuss the importance of finding a balance between testing confidence and efficiency and introduce the concepts of isolated and integrated testing. The speaker also suggests different testing strategies based on the size of the application and emphasizes the need to choose cost-effective testing approaches based on the specific project requirements.
Playwright Test Runner
TestJS Summit 2021TestJS Summit 2021
25 min
Playwright Test Runner
Top Content
The Playwright Test Runner is a cross-browser web testing framework that allows you to write tests using just a few lines of code. It supports features like parallel test execution, device emulation, and different reporters for customized output. Code-Gen is a new feature that generates code to interact with web pages. Playwright Tracing provides a powerful tool for debugging and analyzing test actions, with the ability to explore trace files using TraceViewer. Overall, Playwright Test offers installation, test authoring, debugging, and post-mortem debugging capabilities.

Workshops on related topic

Designing Effective Tests With React Testing Library
React Summit 2023React Summit 2023
151 min
Designing Effective Tests With React Testing Library
Top Content
Featured Workshop
Josh Justice
Josh Justice
React Testing Library is a great framework for React component tests because there are a lot of questions it answers for you, so you don’t need to worry about those questions. But that doesn’t mean testing is easy. There are still a lot of questions you have to figure out for yourself: How many component tests should you write vs end-to-end tests or lower-level unit tests? How can you test a certain line of code that is tricky to test? And what in the world are you supposed to do about that persistent act() warning?
In this three-hour workshop we’ll introduce React Testing Library along with a mental model for how to think about designing your component tests. This mental model will help you see how to test each bit of logic, whether or not to mock dependencies, and will help improve the design of your components. You’ll walk away with the tools, techniques, and principles you need to implement low-cost, high-value component tests.
Table of contents- The different kinds of React application tests, and where component tests fit in- A mental model for thinking about the inputs and outputs of the components you test- Options for selecting DOM elements to verify and interact with them- The value of mocks and why they shouldn’t be avoided- The challenges with asynchrony in RTL tests and how to handle them
Prerequisites- Familiarity with building applications with React- Basic experience writing automated tests with Jest or another unit testing framework- You do not need any experience with React Testing Library- Machine setup: Node LTS, Yarn
How to Start With Cypress
TestJS Summit 2022TestJS Summit 2022
146 min
How to Start With Cypress
Featured WorkshopFree
Filip Hric
Filip Hric
The web has evolved. Finally, testing has also. Cypress is a modern testing tool that answers the testing needs of modern web applications. It has been gaining a lot of traction in the last couple of years, gaining worldwide popularity. If you have been waiting to learn Cypress, wait no more! Filip Hric will guide you through the first steps on how to start using Cypress and set up a project on your own. The good news is, learning Cypress is incredibly easy. You'll write your first test in no time, and then you'll discover how to write a full end-to-end test for a modern web application. You'll learn the core concepts like retry-ability. Discover how to work and interact with your application and learn how to combine API and UI tests. Throughout this whole workshop, we will write code and do practical exercises. You will leave with a hands-on experience that you can translate to your own project.
Detox 101: How to write stable end-to-end tests for your React Native application
React Summit 2022React Summit 2022
117 min
Detox 101: How to write stable end-to-end tests for your React Native application
Top Content
WorkshopFree
Yevheniia Hlovatska
Yevheniia Hlovatska
Compared to unit testing, end-to-end testing aims to interact with your application just like a real user. And as we all know it can be pretty challenging. Especially when we talk about Mobile applications.
Tests rely on many conditions and are considered to be slow and flaky. On the other hand - end-to-end tests can give the greatest confidence that your app is working. And if done right - can become an amazing tool for boosting developer velocity.
Detox is a gray-box end-to-end testing framework for mobile apps. Developed by Wix to solve the problem of slowness and flakiness and used by React Native itself as its E2E testing tool.
Join me on this workshop to learn how to make your mobile end-to-end tests with Detox rock.
Prerequisites- iOS/Android: MacOS Catalina or newer- Android only: Linux- Install before the workshop
API Testing with Postman Workshop
TestJS Summit 2023TestJS Summit 2023
48 min
API Testing with Postman Workshop
Top Content
WorkshopFree
Pooja Mistry
Pooja Mistry
In the ever-evolving landscape of software development, ensuring the reliability and functionality of APIs has become paramount. "API Testing with Postman" is a comprehensive workshop designed to equip participants with the knowledge and skills needed to excel in API testing using Postman, a powerful tool widely adopted by professionals in the field. This workshop delves into the fundamentals of API testing, progresses to advanced testing techniques, and explores automation, performance testing, and multi-protocol support, providing attendees with a holistic understanding of API testing with Postman.
1. Welcome to Postman- Explaining the Postman User Interface (UI)2. Workspace and Collections Collaboration- Understanding Workspaces and their role in collaboration- Exploring the concept of Collections for organizing and executing API requests3. Introduction to API Testing- Covering the basics of API testing and its significance4. Variable Management- Managing environment, global, and collection variables- Utilizing scripting snippets for dynamic data5. Building Testing Workflows- Creating effective testing workflows for comprehensive testing- Utilizing the Collection Runner for test execution- Introduction to Postbot for automated testing6. Advanced Testing- Contract Testing for ensuring API contracts- Using Mock Servers for effective testing- Maximizing productivity with Collection/Workspace templates- Integration Testing and Regression Testing strategies7. Automation with Postman- Leveraging the Postman CLI for automation- Scheduled Runs for regular testing- Integrating Postman into CI/CD pipelines8. Performance Testing- Demonstrating performance testing capabilities (showing the desktop client)- Synchronizing tests with VS Code for streamlined development9. Exploring Advanced Features - Working with Multiple Protocols: GraphQL, gRPC, and more
Join us for this workshop to unlock the full potential of Postman for API testing, streamline your testing processes, and enhance the quality and reliability of your software. Whether you're a beginner or an experienced tester, this workshop will equip you with the skills needed to excel in API testing with Postman.
Monitoring 101 for React Developers
React Summit US 2023React Summit US 2023
107 min
Monitoring 101 for React Developers
Top Content
WorkshopFree
Lazar Nikolov
Sarah Guthals
2 authors
If finding errors in your frontend project is like searching for a needle in a code haystack, then Sentry error monitoring can be your metal detector. Learn the basics of error monitoring with Sentry. Whether you are running a React, Angular, Vue, or just “vanilla” JavaScript, see how Sentry can help you find the who, what, when and where behind errors in your frontend project. 
Workshop level: Intermediate
Testing Web Applications Using Cypress
TestJS Summit - January, 2021TestJS Summit - January, 2021
173 min
Testing Web Applications Using Cypress
WorkshopFree
Gleb Bahmutov
Gleb Bahmutov
This workshop will teach you the basics of writing useful end-to-end tests using Cypress Test Runner.
We will cover writing tests, covering every application feature, structuring tests, intercepting network requests, and setting up the backend data.
Anyone who knows JavaScript programming language and has NPM installed would be able to follow along.