Page 1 of 1

Robots.txt and realcent archived forever

PostPosted: Sat Sep 11, 2010 7:19 pm
by didou
Google, Bing, Yahoo, Ask, Aol, ... and others search engine will store a copy of the forums in their computer.
Also Internet Wayback machine will http://www.archive.org/web/web.php

They can keep copy of the forum and it's post decade after you have erased it and it's publicly available.

It can be avoided by putting a robots.txt file at the root directory
http://www.robotstxt.org/ for full protocol and details

To prevent Internet Wayback machine for storing the content of the website forever
Open a text file and put this in it :
Code: Select all
User-agent: ia_archiver
Disallow:

User-agent: *
Disallow: /

Save it 'robots.txt' and put it in root directory like that : http://realcent.org/robots.txt

To prevent robots (google, yahoo, bing, ...) from caching your page (and make it available to the public for a long time after they are gone) you need to add this command to every HTML page the site generate :

Code: Select all
<META NAME=”ROBOTS” CONTENT=”NOARCHIVE” />


I'm not sure if most user want that, some people may have issue having their post available after they deleted them. I know this website isn't secret at all but still.
Whatever, info is there, choice is yours.