Robots.txt and realcent archived forever

All posts by members, moderators, and the administration of http://realcent.org are for your edification and amusement only. It is NOT the intent of realcent.org or its host to provide investment, medical, matrimonial, legal, tax advice or any other advice or counsel and nothing posted here should be considered to be so.

Robots.txt and realcent archived forever

Postby didou » Sat Sep 11, 2010 7:19 pm

Google, Bing, Yahoo, Ask, Aol, ... and others search engine will store a copy of the forums in their computer.
Also Internet Wayback machine will http://www.archive.org/web/web.php

They can keep copy of the forum and it's post decade after you have erased it and it's publicly available.

It can be avoided by putting a robots.txt file at the root directory
http://www.robotstxt.org/ for full protocol and details

To prevent Internet Wayback machine for storing the content of the website forever
Open a text file and put this in it :
Code: Select all
User-agent: ia_archiver
Disallow:

User-agent: *
Disallow: /

Save it 'robots.txt' and put it in root directory like that : http://realcent.org/robots.txt

To prevent robots (google, yahoo, bing, ...) from caching your page (and make it available to the public for a long time after they are gone) you need to add this command to every HTML page the site generate :

Code: Select all
<META NAME=”ROBOTS” CONTENT=”NOARCHIVE” />


I'm not sure if most user want that, some people may have issue having their post available after they deleted them. I know this website isn't secret at all but still.
Whatever, info is there, choice is yours.
An individual has rights only as long as he can defend them.
User avatar
didou
Penny Collector Member
 
Posts: 279
Joined: Sat May 29, 2010 10:00 am
Location: Quebec/Canada

Return to Realcent.org Operations and Policy Stmts

Who is online

Users browsing this forum: No registered users and 6 guests