Page Cloaking - To Cloak or Not to Cloak
Page cloaking can broadly be defined as a technique used to
deliver different web pages under different circumstances. There
are two primary reasons that people use page cloaking:
i) It allows them to create a separate optimized page for each
search engine and another page which is aesthetically pleasing
and designed for their human visitors. When a search engine
spider visits a site, the page which has been optimized for that
search engine is delivered to it. When a human visits a site,
the page which was designed for the human visitors is shown. The
primary benefit of doing this is that the human visitors don't
need to be shown the pages which have been optimized for the
search engines, because the pages which are meant for the search
engines may not be aesthetically pleasing, and may contain an
over-repetition of keywords.
ii) It allows them to hide the source code of the optimized
pages that they have created, and hence prevents their
competitors from being able to copy the source code.
Page cloaking is implemented by using some specialized cloaking
scripts. A cloaking script is installed on the server, which
detects whether it is a search engine or a human being that is
requesting a page. If a search engine is requesting a page, the
cloaking script delivers the page which has been optimized for
that search engine. If a human being is requesting the page, the
cloaking script delivers the page which has been designed for
humans.
There are two primary ways by which the cloaking script can
detect whether a search engine or a human being is visiting a
site:
i) The first and simplest way is by checking the User-Agent
variable. Each time anyone (be it a search engine spider or a
browser being operated by a human) requests a page from a site,
it reports an User-Agent name to the site. Generally, if a
search engine spider requests a page, the User-Agent variable
contains the name of the search engine. Hence, if the cloaking
script detects that the User-Agent variable contains a name of a
search engine, it delivers the page which has been optimized for
that search engine. If the cloaking script does not detect the
name of a search engine in the User-Agent variable, it assumes
that the request has been made by a human being and delivers the
page which was designed for human beings.
However, while this is the simplest way to implement a cloaking
script, it is also the least safe. It is pretty easy to fake the
User-Agent variable, and hence, someone who wants to see the
optimized pages that are being delivered to different search
engines can easily do so.
ii) The second and more complicated way is to use I.P. (Internet
Protocol) based cloaking. This involves the use of an I.P.
database which contains a list of the I.P. addresses of all
known search engine spiders. When a visitor (a search engine or
a human) requests a page, the cloaking script checks the I.P.
address of the visitor. If the I.P. address is present in the
I.P. database, the cloaking script knows that the visitor is a
search engine and delivers the page optimized for that search
engine. If the I.P. address is not present in the I.P. database,
the cloaking script assumes that a human has requested the page,
and delivers the page which is meant for human visitors.
Although more complicated than User-Agent based cloaking, I.P.
based cloaking is more reliable and safe because it is very
difficult to fake I.P. addresses.
Now that you have an idea of what cloaking is all about and how
it is implemented, the question arises as to whether you should
use page cloaking. The one word answer is "NO". The reason is
simple: the search engines don't like it, and will probably ban
your site from their index if they find out that your site uses
cloaking. The reason that the search engines don't like page
cloaking is that it prevents them from being able to spider the
same page that their visitors are going to see. And if the
search engines are prevented from doing so, they cannot be
confident of delivering relevant results to their users. In the
past, many people have created optimized pages for some highly
popular keywords and then used page cloaking to take people to
their real sites which had nothing to do with those keywords. If
the search engines allowed this to happen, they would suffer
because their users would abandon them and go to another search
engine which produced more relevant results.
Of course, a question arises as to how a search engine can
detect whether or not a site uses page cloaking. There are three
ways by which it can do so:
i) If the site uses User-Agent cloaking, the search engines can
simply send a spider to a site which does not report the name of
the search engine in the User-Agent variable. If the search
engine sees that the page delivered to this spider is different
from the page which is delivered to a spider which reports the
name of the search engine in the User-Agent variable, it knows
that the site has used page cloaking.
ii) If the site uses I.P. based cloaking, the search engines can
send a spider from a different I.P. address than any I.P.
address which it has used previously. Since this is a new I.P.
address, the I.P. database that is used for cloaking will not
contain this address. If the search engine detects that the page
delivered to the spider with the new I.P. address is different
from the page that is delivered to a spider with a known I.P.
address, it knows that the site has used page cloaking.
iii) A human representative from a search engine may visit a
site to see whether it uses cloaking. If she sees that the page
which is delivered to her is different from the one being
delivered to the search engine spider, she knows that the site
uses cloaking.
Hence, when it comes to page cloaking, my advice is simple:
don't even think about using it.