Archives – July 26, 2007

Warning: Keep out!

Thank you everyone for your words and your concerns. It is always good to hear and be reminded that not everyone out there is a scary, threatening person. Thank you!

Now: 

Some of you have emailed me asking about what the hubby is doing to the site to help keep undesirables away from the blog.  Here's Ryan describing what he did, in case anyone else feels the need to follow suit…

Howdy, all.  Here's a quick a crash course in how to (at least partially) keep the sickos away from your site.

Angela found in her site statistics that visitors were coming from Google (and possibly other search engines) having searched for disturbing phrases. One way to keep the weirdos out, we decided, was to keep them from being able to find the site through the search engines. There are a few ways to do this.
(Here comes Ryan's technical description which may scramble your brain, if so, move on to read Option 1, which even I understood how to do.)

A search engine (SE) scours the internet by using computer programs called robots (sometimes spiders) that jump from web links to pages, add the pages to their database, then follow all of the links on those pages to other pages, which they add to the database, then follow its links on to still more pages, ad infinitum. They call this indexing. During an index, robots will add billions of pages to a database that the SE users then search. Because indexing billions of pages uses a huge amount of computing power, SEs are not interested in indexing sites with no desire to be indexed. For this reason, web developers came up with a standard called robots.txt to allow web site owners to keep parts of their sites from being visited by the robot programs.

To keep SEs from indexing your site, you need to let them know that you do not want their robot indexing your pages. There are two ways to do this. One involves adding a line to the HTML of your web pages. The other involves uploading a file to the root directory of your website.  Either one tells a visiting robot that you don't want them there, and away they go without adding you to their database.

These methods can be used selectively, i.e., you can keep out ONLY Google's robot, or turn robots away from only certain files. We're more interested in getting the entire site out of every SE index, so that's what these directions are for.

OPTION 1:
If you don't have the ability to upload files to the main directory, but you do use a blogging program with templates (such as blogger or wordpress), you can add a line to your site template that will then be reproduced on every page of your website and will bounce robots away when they try to visit.

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW" />

This line is a "meta tag," something that is read by browsers and robots, but does not appear on the visitor's screen. When a robot sees this, it leaves without indexing. Using whatever method you have to alter your template, add the above line inside of the <head> tag of your template.

The <head> tag is near the top of the web page code. If you look for the line your template that says…

<head>

…then paste in the meta tag just below that, it will be in the right place. Be careful not to put it after the line that says..

</head>

… This is the head closing tag. You want the opening one without the forward slash.

OPTION 2:
If you are able to upload files to the main directory of your web site, using a robots.txt file is the best choice. This is a text file that all search engines look for when first coming to a site. If the robots.txt file contains instructions for robots to go away, they do. To turn them all away, all the time, the robots.txt file should contain this text, exactly like this:

User-agent: *
Disallow: /

You have to upload this to the main directory of your site or it won't block robots from everything. Here's a downloadable copy of the file that you can use.

NEXT STEP
According to search engine folks, you now have to wait. Eventually, SE robots will visit your site as they crawl the internet, see that you are rejecting them, and will move on. As your site continues to remain inaccessible, the pages already in their index will drop out. Eventually, your site will be gone from the search engine's database. It will still be available to any visitors that know the address, as well as through any links that other sites or bloggers have made to your site.  The people searching on Google, however, will not.

Google allows web site owners to take an additional step and request the removal of their site from the index. You have to set up one or both of the above options first, then you can request removal. We did this yesterday and the request is still pending.  I have not been able to find an option like this with other search engines, and with Google's you have to register with them as a webmaster. Theoretically, this step is unnecessary. The robots.txt thing is supposed to (eventually) ensure your disappearance from searchable web space, we're just not sure how long it takes.

ADDED:  In order to request removal, you need to sign up with Google's webmaster tools.  The sign up page is here.  Once you've signed up, you can click on Webmaster Tools, then your site name, then URL Removals.  The request form is there.  note, however, that this is not strictly necessary.  Blocking the robot is enough to get you out of the index.  In Angela's case, the site was dropped before Google even got around to noticing the request for removal.

NOTE: It took about three days from doing all this for Google to drop Colorfool from the index.

If you feel the need to safeguard your blog and family like this and you need help or have any questions, email Angela and we'll walk you through it.

If you want detailed information, see http://www.robotstxt.org/wc/robots.html.

Did ya get all that? You can't find me through google searches anymore. Which is wonderful for me. However, if you are a site that sells something I'm sure you want people to find you. Since we made this change, traffic to my site has been halved-just keep that in mind if you are trying to make money through your blog. Everyone who reads your blog currently will still be able to read your blog using the same method they always have-unless they always google search you. Just give everyone a heads up on your post and tell them to bookmark you or add your web address to a bloglines type thingie.

Also, if you use FlickR you can safeguard your photos by making them only viewable by friends and family and if you go to "your account" and then the "privacy and permissions" sections you can make you photos unsearchable there and a few other things too, if you feel the need. If you are titling & tagging your photos- be safe. I'd stay away from photo titles/labels like "little girl" "small children" and well, just don't identify that you have photos of your babes if you are not removing yourself from searches.

I hope this is helpful and not too confusing. If you are confused, try reading it again. I had too! I'm dense when it comes to tech. stuff.  Good luck all. 

11 Comments July 26, 2007


Follow Me On ...

Instagram

Now Available in My Etsy Shop

Calendar

July 2007
M T W T F S S
« Jun   Aug »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Archives

Categories