Searching for robots.txt files

The idea behind a robots.txt file is simple – place a static text file at the front door of your website with a listing of files and directories that you do not want search engine robots to index.
Robots txt Search Logo
There are two problems with this concept:

1) Participation is at the robot programmer’s discretion. If you can program a search robot (or search spider as they’re sometimes called), you can program it to ignore the robots.txt standard.

2) By publishing a list URLs you don’t want indexed, you’re also publishing a list of top-secret cool stuff on your site. Take it one step further and start developing a robots.txt search tool, and you can start to see the bigger problem.

My suggestion is that if you have cool stuff on your site that you want protected, consider using the robots meta tags nofollow and noindex. This keeps you from having to publish a road map to your secrets while still making use of the robots.txt standard.

Here’s a quick search just for sites with a published robots.txt file.

More helpful hints:

Search spiders invariably look for a robots.txt file when the visit your site. Create a blank one to allow full access and reduce the number of “file not found” errors in your error log.

Use your robots.txt file to block whole directories rather than single files. Unscrupulous surfers my know which directories you’re protecting, but they’ll still have to guess at file names. Be sure to create a blank index.html file and turn off indexing so you don’t display your protected directory content to the world.


Comments are closed.