PDA

View Full Version : How To make a blog not get indexed by google



abostonboy
03-21-2008, 07:35 PM
I asked a while ago somewhere. So now I ask yet again! ty in advance.

Gaystoryman
03-21-2008, 07:48 PM
use the 'no follow' tag in your wordpress options, privacy I think. Set the first page to static, can also help.

Use your robot's txt to disallow that folder or directory.

Require a username/password to view is another, less ideal way.

set 'robots' to none in your meta tag. (btw you can add that to the 'header.php' file in your templates

MrJD
03-21-2008, 07:58 PM
<meta Name="robots" Content="noindex, Nofollow">

rawTOP
03-21-2008, 10:44 PM
Whether you use the robots.txt file or the robots meta tag really depends on whether you want the spiders to read the page in the first place. If you want the page to pass PageRank to other pages but not be indexed, then you need to use the meta tag. If there's no good reason a spider would ever load the page, then you use robots.txt.

abostonboy
03-21-2008, 11:55 PM
I don't want google to spider any of it at all. Don't want page rank at all. But don't to make visitors register. It's a blog for affiliates, not something I want to be on the search engines.

So what would I use in robots.txt if I may ask? Kinda confused.

Gaystoryman
03-22-2008, 01:05 AM
User-agent: *
Disallow: /tmp/

change /tmp/ to your folder

use the meta tag too.. no index, no follow

GTP
03-22-2008, 07:03 AM
you can use a strange directory name too and don't link it from any other page. It is an extra option (to be used with all the other listed below) to be more sure google can't find the board

Gaystoryman
03-22-2008, 09:29 AM
you can use a strange directory name too and don't link it from any other page. It is an extra option (to be used with all the other listed below) to be more sure google can't find the board

That is a good one, oh and a common one that can be forgotten, DON'T list it in your Site Map. Remove PINGS too, because pings 'announce' your blog.

rawTOP
03-24-2008, 01:16 PM
The only way to keep the spiders out is with robots.txt... Especially if affiliates are linking to it...

Putting "noindex" in a robots meta tag is essentially the same. It will get crawled, but then dropped from the index. Should be safe enough for what you say you'll be doing. The good part is if you have links on the page to other pages on your site, those affiliate blog pages will pass PageRank even though they're not in the SE index.

GTP
03-24-2008, 03:24 PM
The only way to keep the spiders out is with robots.txt... Especially if affiliates are linking to it...

Putting "noindex" in a robots meta tag is essentially the same. It will get crawled, but then dropped from the index. Should be safe enough for what you say you'll be doing. The good part is if you have links on the page to other pages on your site, those affiliate blog pages will pass PageRank even though they're not in the SE index.

Robots are not saying spiders not to scan and surf the pages but just not to index them, just to be 100% clear :)

BTW some minor search engines don't follow the robots.txt specs

rawTOP
03-24-2008, 07:07 PM
Robots are not saying spiders not to scan and surf the pages but just not to index them, just to be 100% clear :)

BTW some minor search engines don't follow the robots.txt specs

Actually it's a huge offense for a spider to crawl a page that's excluded by robots.txt.

Realize that the robots.txt file is case sensitive, so if you say not to crawl /foo.html and there's a link to /Foo.html they're not in violation if they crawl /Foo.html. In other words if you have capitalization in some of the links to your site, you should set up 301 redirects to the lowercase, canonical URLs ('cause it can cause duplicate content issues as well).

But seriously, no good bot will crawl a page excluded by robots.txt and all the major search engines are good bots. Check webmasterworld.com if you need Apache rules for the bad bots. People get pretty elaborate in how they deal with them... But many of the more prevalent bad bots can be blocked pretty easily with Apache. For example you could do something like this:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (badbot1|badbot2|badbot3) [NC]
RewriteRule . [F,L]

Where 'badbot1', 'badbot2', etc. are identifying bits from their user agent strings. That will cause all of their requests to fail.

You can go even further and set up what are called "honey pots" where you put links in your documents to a URL that no user would go to. Then you exclude that URL via robots.txt and any request of that document results in a block of all future requests (usually by blocking the IP address).