“mobile”/ Voluum
Welcome to Our Community
Wanting to join the rest of our members? Feel free to sign up today.

CSP Causes Slow Death for Embedded Browser Bots

Discussion in 'General Affiliate Marketing Forum' started by webDOMinator, Dec 25, 2018.

  1. webDOMinator

    webDOMinator Service Manager Service Manager affiliate

    98
    66
    18
    The only real security that a man can have in this world is a reserve of knowledge, experience and ability.
    -Henry Ford


    The Death of Embedded Browser Bots?

    There's a new thing top sites are using now which has to do with browser content security policies (CSP). As the title of this article suggests, it is helping cause a slow decline in bot traffic to large sites like facebook or the sites from the Microsoft network. Even Google is getting in on the fun. As with any new working piece of web machinery, the trend will eventually catch on to more sites.

    What is CSP?

    Mozilla defines it as such:
    "Content Security Policy (CSP) is an added layer of security that helps to detect and mitigate certain types of attacks, including Cross Site Scripting (XSS) and data injection attacks. These attacks are used for everything from data theft to site defacement to distribution of malware."

    So the browsers are now working with top sites and web developers in order to spot unauthorized activity within the browser environment. The Mozilla document goes on to explain how to implement CSP on a website, and how to use it to control a number of different actions bots might try on you. Here is some more documentation from Chrome about using the CSP to block inline scripts.

    As you may be able to tell, the days of IP/cookies/user agent based security methods for bot detection that graced the internet for two decades is now long gone.

    What Else?

    Besides CSP, there are a myriad of tools at a webmaster's disposal nowadays in order to control and detect bots. Among them are:
    • Javascript Environment Testing - firstly, if javascript is not running at all, then you're not even dealing with a browser bot, but a socket based bot and it's easy to filter them out. This is an old one but it stops those still basic enough to be using a non-javascript environment. Blind testing however may block screen readers and the like.
    • Authenticated Tokens - use of a private token with a publicly encrypted key on each server call, this includes things like "nonce" tags in html. Here is full documentation on nonce from the MDN.
    • HTML5 Canvas Data - this data can be analyzed in order to provide more details on properly identifying the real browser user agent. The method is called Canvas Fingerprinting and there are a number of libraries available for it.
    • Javascript Events - Using javascript events to monitor things like keystrokes in a text box, or exactly where in the element the mouse has clicked allow websites one extra layer of anti-bot-ness. If a user clicks on an element but no mouse data is given along with the event for example: document.getElementById('x').click(); then whatever event is needed to happen to complete an action may not happen at all. It is so easy that it literally comes down to one little variable: event.isTrusted. There is full documentation on isTrusted from mozilla's MDN.
    • NoCaptcha Type Behavior Analysis - Google, in one of their articles about nocaptcha (sometimes called recaptcha v2) mentioned that they are using mouse and keyboard data from their entire network of sites that use google tracking in order to run an AI based analysis on your interaction for the nocaptcha widget they provide (I am not a robot). This type of analysis leads to a guess on whether or not a person is actually using the page. They are hinting that they literally can analyze each little curve of the mouse and how fast you may have typed in text boxes on the page. Right now they are using it only for their widget (to know if they should show you images or just apply the green check box), but they and others will start using this method to give a pretty accurate metric for whether the visitor is a bot or not.
    • Action based Behavior Analysis - This comes in when a person uses a social network or any type of site which allows multiple actions and is usually run server-side. Each action is recorded with a timestamp and checked statistically against the other recorded user action patterns, or a limit based behavior rule-set. The site can automatically red-flag an account or an IP based on this, making a moderator/admin's job much easier when reviewing security for the day.
    • Browser Capability Verification - If a user agent says that it is Firefox 61, then specific Javascript variables are checked to make sure that at least the FF version is higher than 50 let's say. If the variables don't exist or are different from what they should be, then the lie is caught. This works cross-browser as well. If you are using an embedded XULRunner (mozilla's embedded browser [now discontinued]), but you are saying that you're chrome, this little trick will leave you high and dry.
    • Headless Browser Detection - Ever heard of Phantom JS? Yeah, they are a headless browser used for "testing purposes" which runs a full javascript environment in a fully-up-to-date browser, but alas... they are still detectable. Here is a blog on headless browser detection in Chrome, but you can also just search for that phrase on Google.
    There are quite a few other tricks of the trade for sure, but this covers most of the contemporary ones that are only now starting to become popular along with CSP. In short it means that besides being able to detect unauthorized scripts, sites are now also able to fully confirm browser capabilities, detect user agent falsification, and validate user actions. If you pair those things with some of the more old fashioned techniques like IP blacklisting, then you have got yourself quite the anti-bot arsenal and your site moderators are probably readying their Curriculum Vitae's to look for other jobs.

    So then What is left for Bots?

    Quite obviously not all sites know about this. If they did, my own webDOM bot would be quite K.O and out of the game. But slowly other sites are catching on, as usually the trends of the big websites tend to trickle down and become popular. There is still quite the number of sites which do not use any of these methods, but I have a feeling that 2019 is going to bring a lot of new challenges to bot makers and automatic marketing as a whole.

    It's still safe to say that if you're using a bot to do some scraping, etc. that it is possible and may cost you only the cost of a tool, as long as you're not using one of the large sites. Also, with twitter, facebook and MS owned sites it's not that your tool will get you blocked for good. All of them use un-blocking through SMS codes, so it's just that for each account you have you'll need a valid non-blacklisted phone number and of course this makes things much slower and more costly.

    You may be wondering why I would take the time to write this article if I'm a bot maker myself. Isn't it self defeating? Well, I have never been black or white hat when it comes to internet marketing. I like to stay in the nice warm grey area. I am however interested in other bot maker opinions or overall as well as the general marketing public. Besides that, it was about time someone mentioned some new reasons why your accounts are getting deleted left and right ;)

    Security and anti-security is a two way street. 100% Secure does not exist. More accurately put: security is an ever-escalating arms race between those who seek to take an action and those who wish to block that action. There are surely interesting times ahead for all of us watching this never-ending battle unfold.
     
    Last edited: Dec 25, 2018
    BKP and Zaapz Cash like this.
  2. CPA Evolution
  3. Graybeard

    Graybeard Well-Known Member affiliate

    4,784
    2,294
    113
    Funny, I get right past Cloudflare using header spoofs NP today :D
    I can get right past Google with headless binaries too.

    antoinevastel/fp-collect
    haha put that script on a production server -- that's a real bottleneck!

    Fingerprinting is made for abuse situations (cyber-stalking and PCI-DSS transactions) as well as user/customer identification-- IMO

    Have you started using HTTP/2.0 yet and read your logs ;)
     
  4. webDOMinator

    webDOMinator Service Manager Service Manager affiliate

    98
    66
    18
    Which headless browsers? Can you make a FB account?
     
  5. Graybeard

    Graybeard Well-Known Member affiliate

    4,784
    2,294
    113
banners