Client Side Detection

1. Navigator.Webdriver Flag

This is probably one of the most well known Bot Detection Methods - The more juicy stuff that should actually help you catch bots and is not just here to complete the list, starts at point 2 below.

The Navigator.Webdriver Flag indicates whether the browser is controlled by automation tools such as Selenium and is also the Source of that "Chrome is being controlled by automated test software" notification bar you get when using Selenium with Chrome.

It is meant to be used as a standard way for websites to realize that automation tools are used.

You can check for it using code that looks something like this:

var isAutomated = navigator.webdriver;
if(isAutomated){
    blockAccess();
}
                        

But since it's so easy to check for the boolean, it's very easy to remove it, too, and most bot creators do that. 

But just in case someone has forgotten to remove this flag, you should check for it.

  

2. Consistency

This is one of the bigger things a lot of bots forget to do - After changing their User-Agent to something different so as not to have a default User Agent which has something like “Selenium” (or other) written in it, they do not think about the new User-Agent being plausible.

   

Browser Consistency

E.g. they set a Firefox, Safari or IE User Agent even tho they are using a Chrome Browser.

By executing JavaScript challenges like this one, you can find out what Browser the bot is really using and then flag them if it does not match up with the provided User Agent.

eval.toString().length
                        

The following Browsers will return the following values:

  1. Firefox: 37
  2. Safari: 37
  3. Chrome: 33
  4. Internet Explorer: 39

   

OS Consistency

The same idea applies to OS consistency - They set a Linux, macOS or IOS User Agent even tho they are using Windows.

Using the following Code, you can find out what OS the bot is really using:

Navigator.platform
                        

And these are the values returned for each OS:

  1. Windows: Win32 or Win64
  2. Android: Linux armv71 or Linux i686
  3. iOS: iPhone or iPad
  4. FreeBSD: FreeBSD amd64 or FreeBSD i386
  5. MacOS: MacIntel
  6. Linux: Linux i686 or Linux x86_64

   

Screen Consistency

Furthermore, you could also check if the Screen resolution makes sense - e.g. 1920x100 would also not make much sense.

   

Hardware Consistency

You could also check if a Desktop Graphics Card is used on an alleged Mobile Device, using the following Code:

function getVideoCardInfo() {
  const gl = document.createElement('canvas').getContext('webgl');
  if (!gl) {
    return {
      error: "no webgl",
    };
  }
  const debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
  return debugInfo ? {
    vendor: gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL),
    renderer:  gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL),
  } : {
    error: "no WEBGL_debug_renderer_info",
  };
}
console.log(getVideoCardInfo());
                        

Flash Support Conistency

You could also check if Flash is supported even tho a Chrome Browser is allegedly used.

  

You probably noticed by now that there are tons of different things you could check for consistency and sense. AmIUnique.org is a great site if you want to see more of these Variables you could use.

   

3. Headless Detection

A headless browser is a browser that can be used without a graphical interface.

But since humans need a graphical interface It can only be controlled programmatically to automate harmless tasks, such as doing QA (Quality Assurance) tests or (more commonly), to automate web scraping, fake user numbers and advertisement impressions and to look for vulnerabilities on a website.

So since Headless Browsers are definitely not used by Humans, when you detect a Headless Browser, you can be sure that it is a bot - compared to some other tests here that just hint to a bot.

But detecting a Headless Browser is easier said than done. I wrote a whole Article about how to detect a headless Browser here where I create the GitHub repository HeadlesDetectJS which includes 6 different tests to detect a Headless Browser.

So download the finished Code from the GitHub Repository or read the article if you want to find out more about Headless detection.

  

4. Page Flow

Is a user going directly to a Page, doing something and then leaving again? Is a user going from one page to another that don’t link to each other?

These are some examples of suspicious page flow - The user might, of course, just have the website URL saved, but things like that are still suspicious most of the time.

There are several possibilities with varying degrees of difficulty to perform page flow analysis and flagging - you could just set cookies on one page and flag the user if it doesn't exist on another page, you could build whole algorithms or even apply some machine learning.

However you do it, keep in mind that this is just a hint to a bot, and it's not a clear-cut way to detect bots.

This is also the reason why most sites do not use Page Flow analysis. Only some major sites (I'm pretty sure Reddit uses it) do use it as a small part of their Bot flagging software, but just in case you're a major site or not, but still want to implement it, I've added it to this list.

   

5. Mouse Movements / Scrolling

This is probably one of the bigger hints of a bot as well. Bots normally just click on stuff without moving the mouse (Humans could also do that using the tab key, Shortcuts, or even Browser Add-ons that allow the user to only use the keyboard, that is why this point is also just a hint to a bot).

So you just have to track the average Mouse Movement, and if it's below a certain threshold you set yourself, you'll mark it as a potential bot - "Potential" is the key word here, as I explained above, having no mouse movement isn't a definite sign of a bot, it's just a very suspicious clue to one.

One way to track the average Mouse Movement would be to use the following JavaScript Code, which calculates the Mouse Movement every 5 seconds - Note that you should check the Mouse Movement whenever a User presses a Button or Inputs something instead of every 5 seconds, or you will mark AFK users as Bots:

                    let samplingRate = 500; // Change this if you want
                    let currentMouseX = 0;
                    let currentMouseY = 0;

                    document.onmousemove = function(e){
                        currentMouseX = e.pageX;
                        currentMouseY = e.pageY;
                    }
                    let lastMouseX = currentMouseX;
                    let lastMouseY = currentMouseY;

                    let totalMouseMovement = 0;
                    let mouseMovementsTracked = 0;
                    setInterval(() => {

                        // Calculate the difference between the last mouse position and current
                        let xDiff = Math.abs(currentMouseX - lastMouseX);
                        let yDiff = Math.abs(currentMouseY - lastMouseY);
                        // Add current mouse movement to total
                        totalMouseMovement += xDiff + yDiff;

                        // Set last Mouse values to current ones
                        lastMouseX = currentMouseX;
                        lastMouseY = currentMouseY;
                        mouseMovementsTracked++;
                    }, samplingRate);


                    // This will log the average mouse movement in the last 5 Seconds
                    // Instead of doing it every 5 Seconds you should calculate the average Mouse Movement whenever a Form Button or similiar is pressed
                    setInterval(() => {
                        let avgMouseMovement = totalMouseMovement/mouseMovementsTracked;
                        if(avgMouseMovement < 100){
                            console.log("This is probably a bot");
                        }
                    }, 5000);
                        

Extra Note: Of course, you have to keep in mind that there are users on mobile phones, too, and they just don't have a mouse. In order to not falsely flag them as a bot, you should use the Navigator.platform flag I mentioned above or other measures to check if the user is on a mobile device before marking them as a bot.

   

6. Setting Traps

This is one of the more dirty tricks because this method can only be bypassed if a human is always watching the bot and makes sure that the bot is doing its job properly.

  

Adding invisible Honeypots

This method is based on the fact that bots mostly follow the HTML structure and hard-coded instructions (in contrast to some rare bots that make their own decisions through computer vision and AI) and therefore do EXACTLY what they are told.

For example, the bot was told to enter the username in the input field with the id username to log in to the website - but what if you added another identical input field with the CSS display: none styling or white text on white background and thus would not be visible to normal users?

   

I think you already understand where I want to go with this.

Randomly (or only if other indicators point to a bot) add invisible login buttons, input fields, submit buttons or whatever, BEFORE & AFTER the actual fields and buttons.

A disadvantage that arises from this is that people with screen readers or similar software, which is supposed to help disabled people, could have problems with this.

Therefore, you should perhaps only use this method if other indicators point to a bot or if you have previously tested YOUR implementation with a screen reader.

   

Changing Data & Information

This second way of setting traps is based on the fact that most bots blindly trust the data they see.

As the title suggests, change data as soon as you recognize a bot:

  1. Change texts
  2. Change charts and numbers
  3. Change results (e.g. say that a transaction was successful, although it was unsuccessful)

Thus, a bot collects wrong data or does wrong things based on the assumptions it made using your fake data.

Such things are usually only recognized relatively late because a person has checked the bot or has evaluated the data - that is precisely why this is such a "dirty" trick.

      

Server Side Detection

Some of these tests could theoretically also be done Client Side by saving cookies to transfer data between page visits and things like that, but since cookies and other client side methods can easily be manipulated, these tests are better suited for the server side.

  

1. Number of pages seen

This is a very simple but effective method to stop bots (at least when they try to collect or send large amounts of data).

Humans and Bots don’t behave the same. You can imagine that a Bot tends to be faster than a human. No real user will read a News Article in 5 seconds. That is why you should track how many minutes a user spends on the site, how many pages he visited, and so on.

And then if a user makes e.g. 1000 requests in a minute, you block his access with an error message or a Turing / Captcha test.

  

But how would I track a Users site visits and so on?

Tracking(or fingerprinting) a user is a kind of an Art in itself. In my article about “how to make selenium undetectable” I advised bot makers to make their bots look as average as possible and to change their IP using Proxies to make tracking harder, so you will have to come up with a good solution to fingerprint users.

Maybe I'll write an article about fingerprinting in the future but, in a nutshell, all you have to do is get some details about the user, create some kind of hash out of it that will be unique to the User, and then use that hash as a "User ID" on the server side.

Amiunique.org is a good website that shows you how you could construct a Fingerprint based on some browser details.

   

2. Known Proxies / Tor Servers

This method is going to take some work on your part.

Since, as I have already said, bots often use proxies, you might want to check whether an IP user is a known IP proxy.

There are three types of proxies with different detection methods you have to look out for:

  

1. Public Proxies

Public Proxies are relatively easy to detect, you either check if popular Ports like :8080, :80, :3128 are open, or build yourself a Database of known Proxies by using a Proxy Hunter like YAPH or a custom build Scraper that gets Proxies of the most Popular Proxy List Websites.

  

2. Public Tor Proxies (Exit Nodes)

Checking for Tor Exit Nodes is even easier. The Tor Project provides you with a tool called ExoneraTor where you just have to enter an IP Address and Date, and then it will tell you if this IP was used as a Tor Node.

So just write a small script in whatever server side language you use and send a request to this URL and get the contents:

https://metrics.torproject.org/exonerator.html?ip=<USERIP_HERE>&timestamp=<CURRENT_TIMESTAMP>&lang=en
                        

   

PHP Example:

file_get_contents("https://metrics.torproject.org/exonerator.html?ip=46.166.139.111&timestamp=2021-02-19&lang=en”)
                        

  

3. Private Proxies

Private Proxies are more difficult or almost impossible to detect.

With private proxies, you could at most check if popular ports like:8080,:80,:3128 are open, but that's pretty much it - there's a reason Good Private Proxies cost so much money.

   

3. Setting Traps

Yep, I already talked about traps, but only about client side traps, now I'm going to talk about server side traps:

What several crawlers and scrapers struggle to do is back-off as 403/503 errors begin to be served. By simply ignoring those messages and asking for more sites after they get these errors, it becomes fairly obvious that they are really a bot and not a person - No human will try to reload a page 1000 times after they get an error to stop trying.

So if you suspect a bot, just serve him an error and tell him to reload or wait a bit, and if he keeps doing what he did, you can be pretty sure you've just caught a bot.

But keep in mind that this method can come at the expense of putting off real users. Some people just don't want to visit your site anymore if they get an error, so only use this method if other detection methods point to a bot.

   

4. Checking for fake / Temp Emails

Again, this is a pretty straight forward method of catching bots trying to create accounts on your website.

Just don't allow users to use 10-minute emails when registering.

To check for temp emails you could write your own scraper to collect domains from the most popular temp email sites, or you could use a service that does this for you.

   

This "only" comes with the cost of potentially pissing of users who do not trust you and just want to test your service, so a good alternative is also just allowing users to register via Google, Facebook, GitHub or any other account.