The Most Active and Friendliest
Affiliate Marketing Community Online!

“Propeller”/  Direct Affiliate

hide your robots.txt from visitors and show it only for validated robots

B

Bagi Zoltán

Guest
After some hours searching and hacking i have finally found everything to build a solution which makes possible to hide the content of your robots.txt file from visitors but make it display ONLY for validated user agents such as googlebot, Yahoo Slurp and msnbot.

You may find the whole thing very strange why would somebody hide that content? My answer for this question is the following:
That content (folder structure of the core script files) is a private information, and don't want to share it with every script kiddies to make the possible to hurt my site.

How to execute this hack? I will guide through..

1. As first step you need to add these lines to your .htaccess file, or if you don't have create one and upload it to the root domain folder.


Code:
RewriteEngine On
RewriteCond %{http_user_agent} !(googlebot|Msnbot|Slurp) [NC]
RewriteRule ^robots\.txt$ http://seo.i-connector.com/  [R,NE,L]
AddHandler application/x-httpd-php .txt

I think i don't have to explain the first row, the second and the thirs says that if you are not one of the three big search engines and want to reach the robots.txt file you will be redirected to the main domain. It is very handy since a lot of people set their homepage as the landing page of 404 errors, so the cloacking won't be recognised. (will talk about the cloacking a bit later as well)
The fourth row make possible that your robots.txt file behave as a php script.

Now you are ready with the first step, lets see what else you need to do.

2. Open a text editor or your favourite web editor application and insert the code below into a new file save as reversedns.php and upload it to your root folder.

PHP:
<?php
$ua = $_SERVER['HTTP_USER_AGENT'];
if(stristr($ua, 'msnbot') || stristr($ua, 'Googlebot') || stristr($ua, 'Yahoo Slurp')){
$ip = $_SERVER['REMOTE_ADDR'];
$hostname = gethostbyaddr($ip);
if(!preg_match("/\.googlebot\.com$/", $hostname) &&!preg_match("/search\.live\.com$/", $hostname) &&!preg_match("/crawl\.yahoo\.net$/", $hostname)) {
$block = TRUE;
$URL="/";
header ("Location: $URL");
exit;
} else {
$real_ip = gethostbyname($hostname);
if($ip!= $real_ip){
$block = TRUE;
$URL="/";
header ("Location: $URL");
exit;
} else {
$block = FALSE;
}
}
}
?>

This script can be famaliar for many of you. This is a hacked version of the reversedns.php file which was presented some months ago. According to the hack if the robot can not be validated the script will redirect it to your main domain. So i return back for a min to the cloaking or not cloaking issue. I had to recognise that google are not capable to protect my rankings from exploits, so i have to defend myself, hence i belice it is not a bad cloacking only a protection solution. If somebody mask him/herself as googlebot he/she will fail during this robot valadiation so will be redirected to the main domain via php. No way to recognise the cloacking!

3. And as the last step
Open the robots.txt file you would like to protect and insert the code below to the first line.
PHP:
<?php include("reversedns.php"); ?>

You are done, and your robots.txt file is in safe!

Thanks!
 
Last edited:
This is brilliant Bagi, thanks for sharing, I know what you created this for originally :)

Fellow UK WW members, please digg this post :)
 
Bagi,

Do you mind if we refer to your article in UKWW blog?

Digged, stumbled and rep added
 
No Skinner, that is absolutely no problem.:) Thanks for the digg the rep and the stumble:)
 
Hi Bagi!! Awesome post. This is really helpful!

Can you add a few more search bots (host address) in this line?

if(!preg_match("/\.googlebot\.com$/", $hostname) &&!preg_match("/search\.live\.com$/", $hostname) &&!preg_match("/crawl\.yahoo\.net$/", $hostname)) {
}

I would like to check for 'aolbuild|baidu|bingbot|bingpreview|duckduckgo|adsbot-google|mediapartners-google|teoma|yandex' . Can you give me the match strings for the mentioned bots to add in my 'if condition' you have it in your code? ( !preg_match for all the bots - I can't find the string anywhere).

Please advice.
 
How to Use an "if X or X" in a preg_match Statement

PHP:
if(!preg_match("/(bot1|bot2|bot3)/", $hostname)) {....}

This thread is 10+ years old :D
It's also bad as you always want to match what is allowed> !<= NOT to secure something there is no 'not bot' list lol -- too many and then there are fake bots that have the wrong IP (really AS block)



| = or "/( | | | )/" <<<set of what could match

Really, I think it's a waste of time ... and resource intensive -- reverse DNS is slow -- Googlebot, for example; may request the robots.txt 2 or 3 times in a row ... Then if the script messes up or Google comes in with a cloaked User-Agent (they do that BTW) you could mess up your indexing and create a mess that is hard to recover from.

This is silly. Hostile User-Agents do not even request the robots.txt the same way burglars do not ring the door bell or knock first -- they want to just sneak in of course.
 
Last edited:
MI
Back