Using "Robots" Meta Tags
The "robots" meta tag, when used
properly, will tell the search engine spiders whether or not to
index and follow a particular page. Some examples of usage are
as follows:
<meta name="robots"
content="index,follow"> <meta
name="robots" content="noindex,follow">
<meta name="robots"
content="index,nofollow"> <meta
name="robots" content="noindex,nofollow">
Let us first examine what these terms mean
before we explain the usage for each one:
"index"- This directive tells the
search engine robots (or spiders) that it is okay to index the
page. Another words, you are allowing the search engine to
include your page within their search directory.
"noindex"- Using this tag, you are letting the robots
know that this page should not be indexed. Simply put, this page
will not appear in their search directory.
"follow"- When you use this tag, you are telling the
search engines that you want their robot to follow any links
that are found on that page. "nofollow"- The opposite
of the above definition, this directive will tell the robots not
to follow any links on your page. Putting it all together:
With the robots tags explained, let's examine the usage for each
one. 1. <meta name="robots"
content="index,follow"> This tag will be used when
you want the search engine spiders to index the page and follow
the links to other pages. Most search engines use this setting
as a "default" setting. It is possible that you may
not even need to use this tag if you want the search engines to
follow and index the page. However, an article at Search Engine
World (searchengineworld.com/metatag/robots.htm) suggests that
Inktomi does not use this as their default setting. Instead,
they use the "index, nofollow" tag. Better safe
than sorry! There has been much debate over whether or not
it is necessary to use this tag. If there is even a slight
possibility that some search engines do not use this as the
default setting, then it would only make sense to include this
tag if you want your page included in their search directory AND
your links to be followed. Do the research and decide for
yourself. 2. <meta name="robots"
content="noindex,follow"> This tag can be used to
tell the search engines that you do not want the page included
in their directory, but you DO want them to follow the links
that lead to other pages. A good example of its usage would be
your disclaimer or privacy policy pages. You may not want these
pages to show up in the search engines if they are only
important to your actual visitors. However, if the links on
these pages point to other pages that you want the search
engines to find, then you would still want the spiders to
"follow" those links. 3. <meta
name="robots" content="index,nofollow">
This tag will allow your page to be indexed in the search
engines, but any links on that page will not be followed. 4.
<meta name="robots"
content="noindex,nofollow"> When using this tag,
the search engine spiders will not include this page in their
directory and will not follow any links on the page either.
Where does the "robots" tag
belong? The "robots" meta
tag should be used within the <head> and </head>
tags of your page. These tags are located at the top of the html
coding. It will look something like this: <html> <head> <title>Title of your
page goes here</title> <meta name="keywords"
content="word1,word2,word3,word4"> <meta
name="description" content="A brief description
of the content of this page."> <meta
name="robots" content="index,follow">
</head> <body> Your webpage information here.
</body> </html>
More Robots Tags
Google automatically archives a page as it crawls it. This is
called a "cached" version of the page. Visitors can
retrieve the archived version of the page by clicking on the
"cached" link within Google's search results. If you
do not want your content to be archived, you can use the
following tag: <meta name="robots"
content="noarchive"> *This will only prevent your
page from being "cached". If you do not want your page
to be indexed at all, you will still need to include the
"noindex" tag. Another alternative to the above tag is
the tag that specifically addresses Google only. If you want
other search engine robots to archive your site, but you would
like to prevent Google from doing so, then you can use the
following tag: <meta name="googlebot"
content="noarchive"> The
Misuse of Robots Tags Something that
has been popping up on websites everywhere is the Google
indexing tag. This is a silly little tag that is not necessary.
Some people think this tag helps Google to spider your site, but
this simply isn't true. The tag looks like this: <meta
name="googlebot" content="index,follow">.
Some website owners believe that by specifying
"googlebot" that their site has the advantage of being
spidered faster and listed by Google. According to Google's web
crawler information at http://www.google.com/bot.html, you only
need to use the noindex, nofollow, or noarchive tags when you
don't want Google to cache, index, or follow that page. Google's
default setting is to index and follow the links on the page, so
this "so called" googlebot index/follow tag is
completely unnecessary. Another silly little
tag--- The "Revisit-After" Tag <meta
name="revisit-after" content="90 days">
<meta name="revisit-after" content="15
days"> I'm not sure where this myth was started. Today,
you will find this tag all over the Internet. Webmasters have
even promoted it, claiming that it actually works. Are we so
naive to believe the search engine spiders need to know when to
come back? I have never used this tag, and my site has no
problem with being crawled on a regular basis. Even some SEO
(search engine optimization) sites are claiming its value. This
comes back to the importance of always doing your research like
this:
http://www.webmasterworld.com/forum5/4924.htm It is important to examine the correct usage of the
"robots" tag before applying it to your website.
Incorrect usage of tags could result in errors on your page that
cause robots to completely ignore your page all together.You can
find more information about web robots here:
http://www.robotstxt.org/wc/robots.html