English versionEN

Finding Stealthy Bots in Javascript Hide and Seek

JavaScript has a lot of use cases - one of them is automated browser detection. This is a technical talk overviewing the state of the art automated browser for ad fraud, how it cheats many bot detection solutions, and the unique methods that have been used to identify it anyway.

This talk has been presented at JSNation 2022, check out the latest edition of this JavaScript Conference.

FAQ

There are both beneficial and malicious bots on the web. Beneficial bots include those used for testing, while malicious bots can perform activities like ad fraud, social media manipulation, and denial-of-service attacks.

Basic methods for detecting bots include analyzing the user agent string, checking for the absence of JavaScript execution, and observing if a bot can handle complex behaviors like running JavaScript or generating tokens that mimic human interactions.

Browser quirks are inconsistencies or unique behaviors in how different browsers operate. These can be used to detect bots because while bots can mimic many human behaviors, replicating these specific quirks accurately can be more challenging.

Automated browsers have significantly advanced bot capabilities by allowing bots to emulate human browsing more convincingly. These tools can run a DOM, execute JavaScript, and fake user interactions, making bots harder to detect.

Puppeteer is an automated browser developed by Google that is particularly adept at mimicking human browser interactions without being easily detected. It supports extensions and modifications that can enhance bot operations, making it a key tool in sophisticated botting activities.

Advanced bots have evolved to bypass CAPTCHAs by employing techniques that mimic human responses or by using machine learning algorithms to solve CAPTCHA challenges, effectively diminishing the effectiveness of CAPTCHAs as a standalone deterrent.

Emerging techniques for detecting sophisticated bots include analyzing inconsistencies in browser fingerprinting, using behavioral analysis to detect unnatural patterns of interaction, and monitoring session level data for anomalies.

As bot technology evolves, bots are becoming better at mimicking human behaviors and browser characteristics, making it increasingly difficult to distinguish them from real human users without advanced detection techniques.

case study security

Adam Abramov

11 min

20 Jun, 2022

Comments

Video Summary and Transcription

The Talk discusses the challenges of detecting and combating bots on the web. It explores various techniques such as user agent detection, tokens, JavaScript behavior, and cache analysis. The evolution of bots and the advancements in automated browsers have made them more flexible and harder to detect. The Talk also highlights the use of canvas fingerprinting and the need for smart people to combat the evolving bot problem.

Available in Español: Encontrando Bots Sigilosos en el Juego del Escondite de Javascript

1. Introduction to Web Bots

Short description:

I'm here to ask what's going on with bots on the web. We'll talk about simple detections, how the bots got better. We'll talk about what's possibly the best bot out there cheating on most detection solutions. And we'll lastly get to my favorite part, which is how you can find it anyways. My job is playing hide and seek with these bots, so advertisers can avoid them. It's going to be social media, concert ticket sellers, a lot of people facing this issue because the internet was not designed with bot detection in mind. When you do that, yeah, real story, when I was 16, high school product projects may or may have not dropped service to some site. So to make the internet better, we want to detect them. Let's talk detections. Starting with the basics. User agent. Does the HTTP request header identifying the browser? You guys know this. You see it's a Python bot. You block that. Probably not a real user behind that. They figured this out, the bot makers know, they hide the user agent. Let's say you don't run JavaScript on your bot.

Hey, everyone. I'm Adam. I'm super happy to be here, and I'm here to ask what's going on with bots on the web. I'm not talking about the nice ones, the testing. I'm talking about the bad ones. We'll talk about simple detections, how the bots got better. We'll talk about what's possibly the best bot out there cheating on most detection solutions. And we'll lastly get to my favorite part, which is how you can find it anyways.

But before all that, one reason I'm here is because I always like packing stuff, and now I'm the reverse engineer for DoubleVerify. They measure ads. But my job is playing hide and seek with these bots, so advertisers can avoid them. But it's not just advertisers and the games. It's going to be social media, concert ticket sellers, a lot of people facing this issue because the internet was not designed with bot detection in mind. Seriously. The only real standard is bots.txt telling bots what they're allowed and disallowed to do. Basically the honor system asking good people to play nice. When you do that, yeah, real story, when I was 16, high school product projects may or may have not dropped service to some site. But some people actually do this on purpose and at scale, denying service to real users, using what they have to steal, sneakers, sneaking around social media with fake users. I practice that part. So to make the internet better, we want to detect them.

Let's talk detections. Starting with the basics. Not because bot makers can't play around these, but because they're usually the first thing you rely on when you come up with something more complicated because simple detections are pretty straightforward. User agent. Does the HTTP request header identifying the browser? You guys know this. You see it's a Python bot. You block that. Probably not a real user behind that. They figured this out, the bot makers know, they hide the user agent. Let's say you don't run JavaScript on your bot.

2. Detecting Bots with Tokens and JavaScript

Short description:

You can use tokens and JavaScript behavior to detect bots on your site. Browser quirks can be used to verify the true nature of a browser. Digging deep into JavaScript can reveal attempts to hide something.

Maybe you make a token as the detection as the site. In Azure, actually make sure it's created. So if you have a bot that's navigating to your site, not generating this token, not running JavaScript, you know something's going wrong. But let's say they do run JavaScript. All of a sudden, you can check how the browser behaves. You people probably hate browser quirks. Bot makers hate them too, because they can be used to verify what's under the hood and not what the browser is reporting at face value. And sometimes you can dig deep in JavaScript to see if somebody's trying to hide something.

3. Detecting Bots with User Agent and Behavior Tests

Short description:

User agent, hiding with object.define property. Funny stuff, bad attribute, accidental artifact for detection. Cat and mouse theme, hiding the to string. Clever ways around, repeating vectors. JavaScript library creep.js, limited effectiveness. Using data, tokens, duplicate tokens, nonsense navigations for bot catching. Caches as another avenue for detection. Behavior tests, user click frequency.

User agent, we talked about that. That property on the window navigator is going to be read-only. So bot markers, they're going to hide that with object.define property. You look at the property descriptor, you see somebody did funny stuff there, trying to hide a user agent. That's going to be suspicious image here being how you have a bad attribute identifying you. You fix it into something perfectly fine as the bot maker accidentally leave behind an artifact that can be used to incriminate you that's going to be used for detection.

This is going to be a common theme. The cat and mouse of bot detections. Another example is the bot maker can override the to string on something they're trying to hide. So you look at the to string, the to string, they hide that too. There's a fun game established there. Clever ways around this key take away being here the cat and mouse theme that's going to repeat vectors for more detections.

Let's say you're really good at this dumb stuff. You make JavaScript library like creep.js to fingerprint a browser under the hood. That only goes so far because the bot makers they can see what you're doing. Every time you find them some way they're going to evolve they're going to patch a little bit and now we got to use something else. Let's say on a site you want to use data. Data is going to be tricky because you have to mine privacy issues or unruly users that you're testing, but let's say you have a site that users they go to just for the sake of argument. Each page you put a generated token you can validate is a user go through. You can see where that user went. All of a sudden you have a whole new arena for catching these bots, duplicate tokens, nonsense navigations, anything that's giving you even the tiniest hint that somebody maybe just have pre-programmed your logic in some way. It's not an actual user navigating. Some of you might be thinking, hey, will users do this too? And that's absolutely right. With caches and stuff. That's why this isn't a smoking gun on its own, but caches, for example, also introduces a whole other avenue for bot detection. Some bots they clean the cache too much. And we'll get into advanced detection in just a moment, but still in the domain of old-school simple detections, behavior tests. How many times should the user click on the side? Let's say in an hour. OK. So, there's going to be a little spectrum.

4. Bots, Caches, and Automated Browsers

Short description:

How much each user does it? And you're going to look in the edges and one side you have zero, which is going to be my father clicking absolutely zero times after a long day of work. But on the other edge, you have people clicking 172 times per second. Caches were originally used to distinguish humans from bots, but they slow the bots down. The evolution of bots and the advancements in automated browsers have made them more flexible and harder to detect. Puppeteer, Google's automated browser, is the kingpin in botting, making it accessible and difficult to detect.

And also, caches. That's what we came up with originally to distinguish humans from bots. You might be asking, hey, why don't you start with that? The reason is that the bots train, they absolutely demolish humans in simple ones. And this is for complex captures too. So, captures aren't there to prevent bots. What they do currently, the reason you see them, is that they slow the bots down.

Moving forward, let's talk about how the bots got better. Getting closer to the advanced detection, that part, the bot makers haven't been sleeping on their guard all this time. They got better, they keep getting better with every little patch they evolve. Eventually, the game became written in their favor. Ten years ago, they were struggling with Python scripts. In recent news, it's now publicly retweeted, so here's some guy complaining about this to Elon Musk. Ping me in the Q&A if you want to talk about the seekers and the hatters about this some more. I'm right about that one. Point being that the evolution thing is really good at gradual improvement and problem solving.

Bots might eventually become indistinguishable from humans entirely. But moving from philosophy rambles to practice, biggest technical advances bots made in the That's going to be automated browsers. Automated browsers have changed the game, no more requests in Python, you're taking the whole browser, you run the DOM in JavaScript, you can even fake the user. Automated browsers got really good at their thing so they let bot makers fake attributes like the user agent without leaving the artifact behind that you'd normally do in JavaScript. That makes bots more flexible, harder to detect. Browser quirks and all that is useful when the browser automation solution is supporting the browser with whatever quirks they have and you can take the user too with these using vanilla JavaScript or browser hooks, scrolling through some articles with Windows Scrolls, here's some automated browsers that are good for testing but don't be like some of the bot operators I found out using these for fraud schemes because they're not meant to hide anything, they're easy to detect, no, you want something that we're getting to the kingpin here. You want something that's based on this guy, Puppeteer. That's Google's automated browser. It's not malicious of trying to hide but it's really good at its thing and makes itself super easy to extend, makes everything super smooth and with that, all the pieces kind of came into place, we see bots getting weathered better, automated browsers available and operators improving their game, thus the king bot was born. This guy, Puppeteer X-Distill, best at hiding so it can run headless meaning no rendering on the screen, still doesn't get detected, it's amazing, criminally easy to use, making really good botting really accessible. Community behind it is an army looking for even the slightest discrepancies and for example, they patch hardware concurrency, that's the amount of available processors you have, so they can scale the operations, run many of these on the same computer without even raising the suspicion that there are bots, a whole different playing field there. They left some traces when they patched this attribute on the prototype, people detected that on the screen, they found it, they patched it, they did this fast.

5. Detecting Bots: Canvas Fingerprinting and Beyond

Short description:

Let's talk about canvas fingerprinting and how it can be faked. Chromium's headless mode makes it easy to fill objects with fake values for detection. Easy bots can be detected, but hard bots require techniques like hardware concurrency, behavior tests, and data analysis. By analyzing user agent data, you can identify even the best bots. The internet needs more smart people to combat the evolving bot problem.

Let's talk about something harder to fake, canvas fingerprinting, that's when you dynamically render a canvas to fingerprint your device alongside the browser. Should be harder to fake, bam, these faults develop an extension that reports fake values using JavaScript hooks, so they solve pretty much anything.

I want to take the time to explain just one of these before I fly through the others, the Chromium here, when it's headless, it does not end the Chrome.csi, Chrome app, all that performance stuff, to think, okay, can it detect puppeteer when it's headless with this? All of the sudden, not so fast, buddy, they fill every single object with fake values like this, navigator Chrome, load times, runtime, app, so on and so forth, anything that can be used for detection, it's making it stupid easy to use, annoyingly elegant, remember, CAPTCHA's two lines here, a little bit of money, and they solve that. And all of this is going to be just with these two lines here. Everything we talked about, super easy to use, bot tests failed to find it, they just hang the spot detection on their repo, but I promise that, get to what actually works, that part, I made it within 10 minutes, all right.

Obviously, I can't specify too much here, but let's start with the easy part. Easy bots are easy to detect. I'm going to name three ways to go after hard bots. This is going to be quick, but I'm going to say this more than what's out there. Starting with stuff like hardware concurrency, there's still more JavaScript artifacts to be found if you know where to look. These are becoming increasingly rare, though, so I wouldn't count on them long term, but the upside here is that they're very clear cut.

What we'll hold at the time is behavior tests and session level data analysis. Behavior tests that still work usually look at window context discrepancies interacting with the DOM, and data analysis can take many shapes, for example, let's say you pick the user agent perfectly, the question is what value you put there, look at this graph, that's the user agent along navigating to an app, each point is how many people navigated with that specific user agent, you'd say there's some variance here, but this is what it's supposed to look like. Blue line's almost flat, so that's probably because the bot foster got right how the user agent is varied, they got the weight part entirely wrong. They're probably just producing these at random. Normal sites don't do this, and here's one way you can detect even the best bot.

So at the start I asked you what's up with bots on the web? I can't tell you for sure, but what I do know is that they're getting better, we need more smart people like you to be aware so that the internet becomes a better place.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Debugging JS

React Summit 2023

24 min

Debugging JS

Top Content

Watch video: Debugging JS

Mark Erikson

Replay.io

Debugging JavaScript is a crucial skill that is often overlooked in the industry. It is important to understand the problem, reproduce the issue, and identify the root cause. Having a variety of debugging tools and techniques, such as console methods and graphical debuggers, is beneficial. Replay is a time-traveling debugger for JavaScript that allows users to record and inspect bugs. It works with Redux, plain React, and even minified code with the help of source maps.

best practices case study javascript web development debug

A Framework for Managing Technical Debt

TechLead Conference 2023

35 min

A Framework for Managing Technical Debt

Top Content

Feross Aboukhadijeh

Feross is the author and maintainer of WebTorrent, StandardJS, and 100s of other open source projects

The talk discusses the importance of supply chain security in the open source ecosystem, highlighting the risks of relying on open source code without proper code review. It explores the trend of supply chain attacks and the need for a new approach to detect and block malicious dependencies. The talk also introduces Socket, a tool that assesses the security of packages and provides automation and analysis to protect against malware and supply chain attacks. It emphasizes the need to prioritize security in software development and offers insights into potential solutions such as realms and Deno's command line flags.

node.js security

Building a Voice-Enabled AI Assistant With Javascript

JSNation 2023

21 min

Building a Voice-Enabled AI Assistant With Javascript

Top Content

Tejas Kumar

Author of the "Fluent React" bestselling book, software engineer with 23 years of experience, and host of the developer-loved ConTejas Code podcast.

This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.

case study artificial intelligence

A Practical Guide for Migrating to Server Components

React Advanced 2023

28 min

A Practical Guide for Migrating to Server Components

Top Content

Watch video: A Practical Guide for Migrating to Server Components

Fredrik Höglund

ephem.dev

React query version five is live and we'll be discussing the migration process to server components using Next.js and React Query. The process involves planning, preparing, and setting up server components, migrating pages, adding layouts, and moving components to the server. We'll also explore the benefits of server components such as reducing JavaScript shipping, enabling powerful caching, and leveraging the features of the app router. Additionally, we'll cover topics like handling authentication, rendering in server components, and the impact on server load and costs.

react react query next.js case study react server components react 18

Power Fixing React Performance Woes

React Advanced 2023

22 min

Power Fixing React Performance Woes

Top Content

Watch video: Power Fixing React Performance Woes

Josh Goldberg

Open Source enthusiast, TypeScript contributor, writing a book on Typescript

This Talk discusses various strategies to improve React performance, including lazy loading iframes, analyzing and optimizing bundles, fixing barrel exports and tree shaking, removing dead code, and caching expensive computations. The speaker shares their experience in identifying and addressing performance issues in a real-world application. They also highlight the importance of regularly auditing webpack and bundle analyzers, using tools like Knip to find unused code, and contributing improvements to open source libraries.

react performance case study

Workshops on related topic

Hands-On Workshop: Introduction to Pentesting for Web Apps / Web APIs

JSNation US 2024

148 min

Hands-On Workshop: Introduction to Pentesting for Web Apps / Web APIs

Featured Workshop

Gregor Biswanger

In this hands-on workshop, you will be equipped with the tools to effectively test the security of web applications. This course is designed for beginners as well as those already familiar with web application security testing who wish to expand their knowledge. In a world where websites play an increasingly central role, ensuring the security of these technologies is crucial. Understanding the attacker's perspective and knowing the appropriate defense mechanisms have become essential skills for IT professionals.This workshop, led by the renowned trainer Gregor Biswanger, will guide you through the use of industry-standard pentesting tools such as Burp Suite, OWASP ZAP, and the professional pentesting framework Metasploit. You will learn how to identify and exploit common vulnerabilities in web applications. Through practical exercises and challenges, you will be able to put your theoretical knowledge into practice and expand it. In this course, you will acquire the fundamental skills necessary to protect your websites from attacks and enhance the security of your systems.

security testing security

Building a Shopify App with React & Node

React Summit Remote Edition 2021

87 min

Building a Shopify App with React & Node

Top Content

Workshop

2 authors

Shopify merchants have a diverse set of needs, and developers have a unique opportunity to meet those needs building apps. Building an app can be tough work but Shopify has created a set of tools and resources to help you build out a seamless app experience as quickly as possible. Get hands on experience building an embedded Shopify app using the Shopify App CLI, Polaris and Shopify App Bridge.We’ll show you how to create an app that accesses information from a development store and can run in your local environment.

case study e-commerce shopify shopify with react

0 to Auth in an hour with ReactJS

React Summit 2023

56 min

0 to Auth in an hour with ReactJS

WorkshopFree

Kevin Gao

Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool. There are multiple alternatives that are much better than passwords to identify and authenticate your users - including SSO, SAML, OAuth, Magic Links, One-Time Passwords, and Authenticator Apps.
While addressing security aspects and avoiding common pitfalls, we will enhance a full-stack JS application (Node.js backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session securely for subsequent client requests, validating / refreshing sessions- Basic Authorization - extracting and validating claims from the session token JWT and handling authorization in backend flows
At the end of the workshop, we will also touch other approaches of authentication implementation with Descope - using frontend or backend SDKs.

react web development security authentication

Build a chat room with Appwrite and React

JSNation 2022

41 min

Build a chat room with Appwrite and React

Workshop

Wess Cope

API's/Backends are difficult and we need websockets. You will be using VS Code as your editor, Parcel.js, Chakra-ui, React, React Icons, and Appwrite. By the end of this workshop, you will have the knowledge to build a real-time app using Appwrite and zero API development. Follow along and you'll have an awesome chat app to show off!

case study web development realtime react chat app

Hard GraphQL Problems at Shopify

GraphQL Galaxy 2021

164 min

Hard GraphQL Problems at Shopify

Workshop

5 authors

At Shopify scale, we solve some pretty hard problems. In this workshop, five different speakers will outline some of the challenges we’ve faced, and how we’ve overcome them.

Table of contents:
1 - The infamous "N+1" problem: Jonathan Baker - Let's talk about what it is, why it is a problem, and how Shopify handles it at scale across several GraphQL APIs.
2 - Contextualizing GraphQL APIs: Alex Ackerman - How and why we decided to use directives. I’ll share what directives are, which directives are available out of the box, and how to create custom directives.
3 - Faster GraphQL queries for mobile clients: Theo Ben Hassen - As your mobile app grows, so will your GraphQL queries. In this talk, I will go over diverse strategies to make your queries faster and more effective.
4 - Building tomorrow’s product today: Greg MacWilliam - How Shopify adopts future features in today’s code.
5 - Managing large APIs effectively: Rebecca Friedman - We have thousands of developers at Shopify. Let’s take a look at how we’re ensuring the quality and consistency of our GraphQL APIs with so many contributors.

case study scalability graphql

Build Modern Applications Using GraphQL and Javascript

Node Congress 2024

152 min

Build Modern Applications Using GraphQL and Javascript

Workshop

2 authors

Come and learn how you can supercharge your modern and secure applications using GraphQL and Javascript. In this workshop we will build a GraphQL API and we will demonstrate the benefits of the query language for APIs and what use cases that are fit for it. Basic Javascript knowledge required.

case study web development graphql