Decloaking Hazards - Why You Should Shun Caching Search Engi
While all search engines use one form of caching or
another to build their indices, some of them make a
point of displaying cached web pages to their users.
The commonly quoted pretext for this is that it offers
searchers fast access to a page's content, making it
easier to check out whether it's what they are really
looking for in the first place. Of course, what this
actually does is keep visitors on the search engine's
site, making them more susceptible to banner ads and
other means of promotion.
However, the drawbacks this entails are numerous.
- Depending on the search engine's index cycle the
content presented may be quite outdated.
- More often than not, the presented pages will not be
fully functional:
= relative (internal) links tend to get broken
= JavaScript and external Java applets won't work
anymore
= site design and layout may be massacred by incorrect
or non-existent display of external Cascading Style
Sheets (CSS)
= banner ads may not be displayed properly, thus
depriving webmasters of revenue
= dynamic content may not be rendered the way it was
originally set up.
- Displaying content within an alien context (e.g.
under the search engine's header, encased in a frame,
etc.) beyond the control of said content's
generators/authors, arguably constitutes a blatant
infringement of intellectual property and copyrights.
Moreover, for a web site employing IP delivery, this
practice constitutes a prime Decloaking Hazard: as
cloaking works by feeding an optimized (or, at least,
different) page to search engine spiders not intended
for human perusal, caching such pages and displaying
them for the asking will reveal your cloaking effort,
this rendering it useless - any unscrupulous competitor
could easily steal your cloaked code to optimize their
own pages with it and achieve better rankings to your
detriment.
The most prominent search engine displaying cached web
pages not of their own making is, of course, Google. In
the past Google staff would promptly comply with any
request by webmasters not to display cached pages.
Then, about a year and some ago, Google introduced a
proprietary meta tag (META NAME="GOOGLEBOT"
CONTENT="NOARCHIVE") for webmasters to include in the
header of those pages they want to see excluded from
this feature.
The Google meta tag actually works. While there was
some indication immediately after their introduction
that sites opting for this exclusion might be penalized
ranking wise, this seems to have abated. Obviously,
should Google really start a witch hunt on cloaking
sites, as their public announcements are font of
stating every other month or so, it only stands to
reason that web sites making use of this special meta
tag might constitute prime targets. For this reason we
do not recommend cloaking for Google unless you do it
exclusively from a dedicated shadow domain.
Another company, Germany based brainbot technologies AG
offers search engine technology for portals:
< http://brainbot.com/ >
Brainbot robots are also spidering international domains:
#UA gigabaz/3.14 (baz@gigabaz.com; http://gigabaz.com/gigabaz/)
mail.brainbot.com
134.93.7.97
#UA gigaBazV11.3 bazbrainbot.com; http://brainbot.com/gigabaz/
151.189.96.99
One licensee making use of their cached results is geekbot:
< http://www.geekbot.org >
On their result pages you will find a "scan" function -
this will display cached pages, albeit in a different
format.
French search engine AntiSearch offers display of
cached web pages, too:
< http://www.antisearch.net/ >
AntiSearch operates the following spiders:
#UA antibot-V1.1/i586-linux-2.2
62.210.155.49
#UA antibot-V1.1/i586-linux-2.2
62.210.155.50
#UA antibot-V1.1/i586-linux-2.2
62.210.155.56
#UA antibot-V1.1/i586-linux-2.2
62.210.155.58
#UA antibot-V1.1/i586-linux-2.2
62.210.155.59
Finally, let's not forget German search engine
Speedfind:
< http://www.speedfind.de >
Speedfind, too, offers display of cached pages.
Due to the peculiar legal situation in Germany, which
makes webmaster fully liable for links to third party
pages unless they post an explicit disclaimer
prominently on their site, Speedfind refuses all
liability for the pages thus displayed:
"SPEEDFIND DOCUMENT FROM CACHE VIEWER
SPEEDFIND is in no way liable for content displayed
below.
All rights belong to the respective page's author.
We are only displaying a copy of said page."
(Translated from German)
So while they do acknowledge authors' full rights, same
authors' permission for display of copyrighted content
is never requested - there is no indication in their
terms of submission how to prevent page caching.
Speedfind operates the following spiders:
#UA visual ramBot xtreme 7.0
proxy-gate.oberland.net
192.109.251.26
#UA speedfind ramBot xtreme 8.1
new.speedfind.de
194.97.8.162
#UA speedfind ramBot xtreme 8.1
eins.speedfind.de
194.97.8.163
#UA visual ramBot xtreme 7.0
c2.oberland.net
194.221.132.56
#UA visual ramBot xtreme 7.0
io.oberland.net
194.221.132.139
Rather than bother with minor players like Speedfind,
AntiSearch and brainbot by excluding them from your
submission process, you may want to consider blocking
their spiders from access to your web site altogether
(lest your competitors should submit your site behind
your back!).
In this case, we would recommend using our
fantomas multiBlocker(TM) for a professional blocker
solution:
< http://fantomaster.com/famultiblocker0.html >
About the Author
Ralph Tegtmeier and Dirk Brockhausen are the co-founders
and principals of fantomaster.com Ltd. (UK) and
fantomaster.com GmbH (Belgium), < http://fantomaster.com/ >
a company specializing in webmasters software development,
industrial-strength cloaking and search engine positioning
services. You can contact them at
mailto:fneditors@fantomaster.com