View Single Post
  #3 (permalink)  
Old 06-12-2008, 05:15 PM
Ingenue Ingenue is offline
Junior Babbler
 
Join Date: May 2008
Posts: 8
Ingenue has a few positive reputation points
Default

A robots.txt file can be very useful in some cases. For example, you might have a website with a user control panel in the admin folder. You should disallow indexing of that area. Again, you might also have a public file in that area that you may want indexed, you can then just allow that one page (don't know why you would, but this is just an example).

On many of the servers I code on, I might have php test code I am working on in a folder called testcode. I just disallow indexing of that folder in robots.txt.

I have known of programmers who keep .pdf files, that are created on the fly in a contact form to be sent as an attachment, in a public temp folder that is never cleaned (shoddy programming, but I guess we don't all think alike). Well, .pdf files can easily be indexed and their contents made aware. Google, if you noticed, has a "read as html" link feature for .pdf documents. You wouldn't want that to happen, especially on potential customer correspondence. So just disallow that temp folder.

(You also need to be careful as to what is placed on your server. Not all spiders/crawlers respect the robots.txt file and they index everything.)
Reply With Quote