If you have a lot of 404s in your access logs from search engines looking for robots.txt, then this post is for you.
A robots.txt file in your web root can help search engines index your site’s pages more efficiently by telling either all search engines, or just specific ones, which areas of your site to index and which areas to stay away from. Why is that important?
- Spiders can really throw off your metrics. I found that an aggregator I recently release was getting massive amounts of clicks in a particular category. A little investigation revealed that it was an MSN bot following links to my click handler, completely ruining any chance of gathering any type of meaningful data.
- Although you probably want most of your site indexed, there is no sense in drawing people to your site through search engine results that aren’t really relevant. robots.txt files can help improve the quality of all search results which makes the web a better experience for everyone.
robots.txt files are quick and simple to write and put in place. Find out more at robotstxt.org.
(Note that robots.txt files are not a replacement for security. If you really don’t want anyone to see a particular portion of your site, protect it with a password!)