“mobile”/ Voluum
Welcome to Our Community
Wanting to join the rest of our members? Feel free to sign up today.

Master List of Known Bots

Discussion in 'Programming and Scripts' started by Graybeard, Sep 10, 2018.

  1. Graybeard

    Graybeard Well-Known Member affiliate

    4,998
    2,392
    113
    netcafes, Pritom Roy, Matriex and 3 others like this.
  2. CPA Evolution
  3. affmarketer101

    affmarketer101 Affiliate affiliate

    694
    265
    63
    Nice share. Thanks so much. We'll live in the world of no bots. :)
     
    tyoussef likes this.
  4. Pavel Yudkevich

    Pavel Yudkevich Affiliate affiliate

    86
    53
    18
    Thank you very much for this share :)
     
  5. Graybeard

    Graybeard Well-Known Member affiliate

    4,998
    2,392
    113
    There is a downside to this:

    What if there in some new start-up that is a webmaster friendly search indexing bot?
    I would hate to deny the next viable challenger to elGooG ...
     
  6. webDOMinator

    webDOMinator Service Manager Service Manager affiliate

    You'd have to just deny specific pages to that bot or most bots... but you could still serve up the content that you wanted index... say the public content. The only reason to block bots is to block them from scraping content that your site generates or uses, like private user profiles, etc.
     
  7. Graybeard

    Graybeard Well-Known Member affiliate

    4,998
    2,392
    113
    Bad bots, page scrapers get a 444 from my servers -- no reply. They don't request nor adhere to any robots.txt.
    SEO bots are and instant ban when found.

    this is a bit of an overkill -- however there is a lot of good info there
    mitchellkrogza/nginx-ultimate-bad-bot-blocker
     
  8. Graybeard

    Graybeard Well-Known Member affiliate

    4,998
    2,392
    113
    Told Ya;
    This code kiddie was too dumb to forge the header HA +1
    PhantomJS had he forged the header right, I might have not noticed.
    PhantomJS/2.0.1-development Safari/538.1
    Code:
    2  67.217.35.167 - - [31/Dec/2018:15:58:32 +0000] "GET /img/dog-affiliate_700.jpg HTTP/1.1" 200 166691 "https://domain.com/" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.1-development Safari/538.1"  
    
    3  67.217.35.167 - - [31/Dec/2018:15:58:32 +0000] "GET /js/wyd.js HTTP/1.1" 200 24 "https://domain.com/" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.1-development Safari/538.1"  
    
    4  67.217.35.167 - - [31/Dec/2018:15:58:32 +0000] "GET / HTTP/1.1" 200 444 "-" "Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.1-development Safari/538.1"
    
    [email protected]:~$ ./ipinfo.sh
    Pls enter your ip:                                                                                                          
    67.217.35.167                                                                                                  {  "ip": "67.217.35.167", 
    "city": "", 
    "region": "",
    "country": "US", 
    "loc": "37.7510,-97.8220", 
    "org": "AS22458 NetSource Communications, Inc."}
    [email protected]:~$ 
    Another data center eliminated ...
     
  9. webDOMinator

    webDOMinator Service Manager Service Manager affiliate

    Just recently found on one of my sites, some DMCA scanning bots.

    Identifiable mainly by their user agent:
    Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/71.0.3578...

    The other one is:
    Mozilla/5.0 (Windows NT 6.1; rv:38.0) Gecko/20100101 Firefox/38.0

    And last but not least:
    Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100721 Firefox/3.6.8

    The funniest one were the ones that specified HeadlessChrome, like that's some sort of legitimate browser.

    I think it's safe to say any version of FF lower than 50 or chrome lower than 60 probably has't updated cause they're a bot. Interesting note, Lower versions of firefox like 38, are embedded browsers based on XULRunner, which is now obsolete/discontinued according to mozilla...and licensing they do with companies who own programming languages like Oracle are shaky at best, so people embedding xul are going to be way behind in browser version. That version 3.6.8, lol, thats cause they are using GeckoFX capabilities which were discontinued even before FF version 15 I think.

    I'm not going to specify the copyright watchdog companies that are using these, it's bad for business considering SEO on this forum is pretty decent ;) but heads up, some more trash to add to the list.

    If it's not the black hats, it's the white knights, but all of their bots are not welcome in me site, yar.
     
    Graybeard and Zaapz Cash like this.
  10. Graybeard

    Graybeard Well-Known Member affiliate

    4,998
    2,392
    113
    "^^I think it's safe to say any version of FF lower than 50 or chrome lower than 60 probably has't updated cause they're a bot. Interesting note,"
    ya think so :D

    Code:
        ~*Chrome/([1-40]\.[0-9])    1;
        ~*Opera/([1-29]\.30\.[0-9])    1;
        ~*=Mozilla    1;
        ~*Mozilla/([4]\.[0-9])    1;
        ~*Firefox/x\.x    1;
        ~*Firefox/([1-5][0-9].*)    1;
    
    That's what FF 3.x is about 'dumb-runner'? Lot of out of date c0d3k1dd13s

    Google is still using chrome 49 to check for older mobile (I see occasionally) the IP is a google IP (I checked)
     
    Last edited: Jan 2, 2019
    webDOMinator likes this.
  11. Graybeard

    Graybeard Well-Known Member affiliate

    4,998
    2,392
    113
    Actually
    Chrome/41.0.2272.96
    ==============
    Code:
         1  66.249.66.28 - - [06/Jan/2019:02:13:20 -0500] "GET / HTTP/1.1" 200 6016 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
         2  66.249.66.29 - - [05/Jan/2019:22:31:54 -0500] "GET / HTTP/1.1" 200 6010 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
         3  66.249.66.30 - - [05/Jan/2019:17:55:22 -0500] "GET / HTTP/1.1" 200 6046 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
         4  66.249.66.29 - - [05/Jan/2019:17:04:59 -0500] "GET /x.html HTTP/1.1" 200 2227 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
         5  66.249.66.28 - - [05/Jan/2019:09:22:19 -0500] "GET /? HTTP/1.1" 200 5318 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
         6  66.249.66.28 - - [05/Jan/2019:09:02:39 -0500] "GET / HTTP/1.1" 200 5304 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
         7  66.249.64.74 - - [05/Jan/2019:08:12:22 -0500] "GET /x.html HTTP/1.1" 200 2213 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
         8  66.249.64.70 - - [05/Jan/2019:07:52:29 -0500] "GET / HTTP/1.1" 200 5293 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
         9  66.249.64.74 - - [04/Jan/2019:23:50:46 -0500] "GET /?x HTTP/1.1" 200 5318 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
        10  66.249.64.72 - - [04/Jan/2019:23:30:32 -0500] "GET /x.html HTTP/1.1" 200 2233 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
        11  66.249.64.70 - - [04/Jan/2019:22:29:30 -0500] "GET / HTTP/1.1" 200 5309 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
        12  66.249.64.74 - - [04/Jan/2019:14:57:46 -0500] "GET /x.html HTTP/1.1" 200 2212 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
    
    
     
    webDOMinator likes this.
  12. Certified
    tyoussef

    tyoussef Moderator moderator Certified Vendor Service Manager affiliate

    4,228
    3,857
    113
  13. Xero

    Xero Guest

    Any noob/very noob guide on how to implement all the blocks on WordPress?
     
  14. Graybeard

    Graybeard Well-Known Member affiliate

    4,998
    2,392
    113
    only plugins that limit /wp-admin log-in attempts AFIK.
     
    Xero likes this.
  15. Xero

    Xero Guest

    Thanks. Even for the seo bots there is no option?
     
  16. Graybeard

    Graybeard Well-Known Member affiliate

    4,998
    2,392
    113
    Apache2? .htaccess
    If you have a VPS or dedicated (with root) ...
    In Nginx you can deny IPs or IP CDIR x.x.xx.0/16
    example block a whole network;

    Code:
    $ ipcalc 90.80.50.0/16
    Address:   90.80.50.0           01011010.01010000. 00110010.00000000
    Netmask:   255.255.0.0 = 16     11111111.11111111. 00000000.00000000
    Wildcard:  0.0.255.255          00000000.00000000. 11111111.11111111
    =>
    Network:   90.80.0.0/16         01011010.01010000. 00000000.00000000
    HostMin:   90.80.0.1            01011010.01010000. 00000000.00000001
    HostMax:   90.80.255.254        01011010.01010000. 11111111.11111110
    Broadcast: 90.80.255.255        01011010.01010000. 11111111.11111111
    Hosts/Net: 65534                 Class A
    
    
    Then there are firewalls iptables ufw ... that IP range gets no response -- lights out --{stealth}


    ouch don't :p

    % Information related to '90.80.0.0/16AS3215'

    route: 90.80.0.0/16
    descr: France Telecom SCE
    descr: FT-SCE
    origin: AS3215
    remarks: -------------------------------------------
    remarks: For Hacking, Spamming or Security problems
    remarks: send mail ONLY to abuse -at --orange business.com
    remarks: -------------------------------------------
    mnt-by: RAIN-TRANSPAC
    org: ORG-OBS3-RIPE
    created: 2007-07-05T14:41:55Z
    last-modified: 2010-11-10T17:03:45Z
    source: RIPE
     
    Last edited: Jan 6, 2019
  17. webDOMinator

    webDOMinator Service Manager Service Manager affiliate

    Oh shit, @Graybeard do I hear a WP plugin colab coming on?
     
  18. Graybeard

    Graybeard Well-Known Member affiliate

    4,998
    2,392
    113
    not really -- my ban lists are proprietary -- and updated at my leisure --but manually :(
    There is no realistic and reliable program logic to this --too many variables. HTTP/1.0 is obvious. Many legit browser cannot accept HTTP/2.0 yet. Server farms (data centers) are obvious.
    I can curl past Cloudflare with one of my "residential IPs"
    Most of this is intuitive -- that has always been the problem.

    Free Proxy / VPN / TOR / Bad IP Detection Service via API and Web Interface | IP Intelligence this is interesting -- useful
     
    webDOMinator likes this.
  19. webDOMinator

    webDOMinator Service Manager Service Manager affiliate

    Yeah for sure it would remain proprietary. Your hard work would be protected. A WP plugin like this would be paid as well, not giving away your (manual, Ouch!) work, just providing access to an API endpoint. You wouldn't even have to colab with me. I know there are plenty of WP sites that would pay for something like that to help protect their own data. Anyway, just an idea I had while reading through this thread ;) By all means take it or leave it.
     
  20. affseeker

    affseeker Affiliate affiliate

    6
    1
    1
    Thanks for your share , we need it.
     
  21. Graybeard

    Graybeard Well-Known Member affiliate

    4,998
    2,392
    113
    That's an interesting idea -- however -- if you need to validate the user in an api -- you would have to limit the requests per month. SaaS tier pricing.
    Otherwise, the plugin will be used on more than one blog if it's sold on a one time and unlimited use basis.

    I use one or more of 4 databases (on my servers locally -- synchronized )
    ip_block
    asn_block
    tor_block
    geo_block

    1-4 on a domain depending on the domain's level of security
    only tor_block is a cron update (30 min)

    The other thing is that checking over a million IPs maybe 30K rows/lines takes time, Locally maybe +-50ms -- remote? IDK >400ms ?
    It would slow down the initial page load but could set a browser cookie but only for that website. So each subsequent page would not make the request.

    That link Free Proxy / VPN / TOR / Bad IP Detection Service via API and Web Interface | IP Intelligence is pretty good at detection and has an api -- he might be the right guy to talk with -- he is actively maintaining his database and getting a lot of user input with his api <<< I use it to screen as a shell script --but after the fact from my access log.

    Code:
    #!/bin/bash
    #ipintel.sh
    
    echo "Pls enter your ip:"
    read ip
    #whois -h whois.cymru.com "$ip"
     #curl "https://ipinfo.io/$ip"
    curl "http://check.getipintel.net/check.php?ip=$ip&[email protected]&flags=b"
    echo =a[1] is real bad!
    
    
     
    webDOMinator likes this.
banners