Let's talk about something harder to fake, canvas fingerprinting, that's when you dynamically render a canvas to fingerprint your device alongside the browser. Should be harder to fake, bam, these faults develop an extension that reports fake values using JavaScript hooks, so they solve pretty much anything.
I want to take the time to explain just one of these before I fly through the others, the Chromium here, when it's headless, it does not end the Chrome.csi, Chrome app, all that performance stuff, to think, okay, can it detect puppeteer when it's headless with this? All of the sudden, not so fast, buddy, they fill every single object with fake values like this, navigator Chrome, load times, runtime, app, so on and so forth, anything that can be used for detection, it's making it stupid easy to use, annoyingly elegant, remember, CAPTCHA's two lines here, a little bit of money, and they solve that. And all of this is going to be just with these two lines here. Everything we talked about, super easy to use, bot tests failed to find it, they just hang the spot detection on their repo, but I promise that, get to what actually works, that part, I made it within 10 minutes, all right.
Obviously, I can't specify too much here, but let's start with the easy part. Easy bots are easy to detect. I'm going to name three ways to go after hard bots. This is going to be quick, but I'm going to say this more than what's out there. Starting with stuff like hardware concurrency, there's still more JavaScript artifacts to be found if you know where to look. These are becoming increasingly rare, though, so I wouldn't count on them long term, but the upside here is that they're very clear cut.
What we'll hold at the time is behavior tests and session level data analysis. Behavior tests that still work usually look at window context discrepancies interacting with the DOM, and data analysis can take many shapes, for example, let's say you pick the user agent perfectly, the question is what value you put there, look at this graph, that's the user agent along navigating to an app, each point is how many people navigated with that specific user agent, you'd say there's some variance here, but this is what it's supposed to look like. Blue line's almost flat, so that's probably because the bot foster got right how the user agent is varied, they got the weight part entirely wrong. They're probably just producing these at random. Normal sites don't do this, and here's one way you can detect even the best bot.
So at the start I asked you what's up with bots on the web? I can't tell you for sure, but what I do know is that they're getting better, we need more smart people like you to be aware so that the internet becomes a better place.
Comments