Deliver Your Web Site From Evil (Part 2)
Here are some tips to protect your content from being ripped
off. Other sites can easily outrank you in SERPs for your own
content. This can happen if they have higher PageRank than you,
or get spidered by Google first.
You should:
1. Set up a bad-bot-banning script on your site.
There's one offered in this forum: http://www.w
ebmasterworld.com/forum88/10425.htm
This is to bar leechers. You use 'robots.txt' to disallow a
subdirectory. After two weeks, you set up the bad-bot script.
The idea is that a bot that accesses a subdirectory, which you
have disallowed, is a bad bot. It's just hoovering up your data.
It's not from a legit search engine. So it's probably a
competitor, or a leech.
The script rewrites your .htaccess file to forbid the bot access
to your site altogether. Webmasters are keen to ban some bots to
save bandwidth also.
Furthermore, you can ban individual downloader softwares by
their HTTP_USER_AGENT environmental variable:
http://www.ttfreeware.com/download/htaccess-forbid-bad-bo
ts-etc.txt
2. Forbid visitors from Russia, China, Romania etc. using
.htaccess.
This is to bar countries that are more likely to try leeching,
or other jiggery-pokery:
http://www.ttfreeware.com/download/htaccess-denial-code.txt
Add or delete countries according to your needs by downloading
this file:
http://www.location
.com.my/Countries.zip
You can use the database within to sort countries by continent,
and build up a list.
Some Third-World countries are noted for hacking and fraud. You
may think you want the whole world to come to your site, but you
don't. Most of the world is poor, and $30.00 USD is a lot of
money. It's not that they wouldn't _like_ to buy your goods;
they just can't afford to. They are more likely to be
tyre-kickers than customers.
3. Use absolute URLs in your internal links.
This is to have lots of links to your main site in any HTML
copied from you. Plaster your absolute URL, phone number, email
address, AND absolute IMG urls all over your pages. Then bar
offsite image hot-linking, using .htaccess. You can use
.htaccess mod_rewrite to replace hot-linked images with an image
of your choice.
This makes it less worthwhile to copy your content. They get a
lot of broken images and big ads for your site in it.
4. Bar offsite image hotlinking by using .htaccess mod_rewrite:
http://www.ttfreeware.com/download/htaccess-image-prote
ction-code.txt
5. Have a script generate a small amount of random content in
the HTML of each page.
Find a random quote script at http://www.hotscripts.com.
Get one that can be called using Server Side Includes (for
static HTML pages), or a PHP one that you can insert in the
footer file of your PHP site. Replace the included quotes with
some original ones of your own.
This will make your page seem to be the latest, updated version
of the content _if_ you have a higher Google PageRank than your
plagiarist, _or_ Google indexes your content before theirs. Or
both!
6. If copyists use Adsense, report them to Google via the Google
adword link in each ad.
This is to rob them of the reason to rip-off your content.
Google will (hopefully) terminate their account.
7. Insert a frame-breaking JavaScript in each page.
This is to break a stolen page out of any page framing it:
http://www.ttfreeware.com/download/break-frames-javascrip
t-code.txt
8. Use Google Alerts to email you when your domain name turns up
on a site they spider.
Google Alerts: http://www.google.com/alerts. Once I got an email about a site containing a URL of mine.
Went to the site. Nothing but Google Adsense; no other text at
all. The Google Alert email showed text I recognised as being
mine. No sign of it in the HTML of the 'linking' site. Looked
suspect. Ratted them out to Google Adsense.
9. Type this into the Google search box:
inurl:www.yoursite.com
-site:www.yoursite.com
... to find possible plagiarists.
This command looks for sites with 'www.yoursite.com' in the URL,
excluding your own. It will show sites using redirect scripts.
Many are harmless, some are not.
Some sites use an Apache web server 302 temporary redirect to
usurp your search engine results ranking. Google sometimes sees
their site as the originator. It helps the plagiarists if they
have a higher PageRank than you. Because Google is searching
through its own cache of the competitor site, you can't block
these redirects from your own site.
If the results from the query above contain URLs with script
names like 'nph-proxy.cgi' in them, you may have a problem.
However, Google seems to be getting better at pushing these
sites into its supplemental listings.
The trick seems to be who gets spidered first, and who has
higher PR. So get working on getting more links!
10. Put "noarchive" meta tags in all your pages:
http://
www.ttfreeware.com/download/meta-tag-code.txt
That won't stop people grabbing your pages, but it's one less
resource for people who want to poke around your site.
11. Use a Redirect 301 in your .htaccess file to redirect from
yourdomain.com to www.yourdomain.com, (or the
reverse, whichever is the most common way you write your URLs):
http://www.ttfreeware.com/download/301-domain-redirect-code.t
xt
This stops search engines finding 'duplicate' copies of your
site, and hammering you with a duplicate content penalty.
12. http://www.CopyScape.com can
help find content thieves.
I wouldn't debate the matter with plagiarists. Give them 48
hours to remove your content, then inform their ISP. Accept no
excuses, like "My web developer did it". That's the oldest
get-out whine in the book: "Someone else did it, it wasn't me!".
Save your email correspondence as templates for future use.
13. You could also contact the plagiarists' advertisers.
For example: "Your ad is displayed on this page here
blahdeblah.com/copypage.htm, check out my page here
mysite.com/original.htm."
14. Slap them with a DMCA order.
A working example: http://www.google.com/dmca.h
tml. DMCA means Digital Millenium Copyright Act. Relax, this
is relatively easy. It's a formal way of telling web sites and
search engines who the real owner of the content is, and getting
them to remove plagiarism.
Individually these tricks wouldn't do much. Together, however,
they will 'harden' your site.