Assertivenet Spider is Gigablast Spider

Introduction and Purpose

The purpose of this article is to provide evidence and information to counteract the suggestion that Assertivenet is potentially used for malicious purposes.

Initial Research

On Saturday, March 11, 2006, I received a somewhat urgent telephone call from a client of mine, Hibiscus Florals (www.hibiscusflorals.com). The owner, Mark Morkowski, was concerned because he had been reviewing his website traffic statistics and had noticed that at numerous points throughout the day, a user or spider from "ASSERTIVENET" (IP 66.154.103.125) had visited the Hibiscus website.

Since this was rather unusual, Mark elected to investigate further by searching for more information "Assertivenet" via the Google search engine. The first three results that he found appear below:

From this information, Mark and I gathered that the owner of the spider in question appears to be a company called Assertive Networks, and hosted through a company called "BC Hosting." More information wass not immediately available.

It is this lack of information that likely led some of the members of the PowerBASIC forums to block the IP range 66.154.* from accessing their various websites, and justifiably so. But this same lack of information led to additional questions:

  1. What files was the Assertivenet spider accessing/trying to access? Was the spider crawling pages or, like some bots, was it looking for specific files that could be used for malicious purposes (e.g. files and scripts that could be manipulated for website attacks?)
  2. Why is the apparent owner of the Assertivenet spider a web hosting company (BC Hosting)?
  3. What is the intended purpose of the Assertivenet spider?
Additional Research - All Is Not As It Appears

At this point, I decided to look beyond what the website traffic statistics revealed, as well as the information that Mark's initial search revealed. I needed to start by answering the questions I posed earlier, and in order to do so, I needed to access the raw log files for the Hibiscus website.

I opened up the log files, searched for the particular IPs in question, and found a series of entries such as these:

2006-03-11 03:47:34 66.154.103.125 - 216.89.218.168 80 GET /robots.txt - 200 0 400 285 78 HTTP/1.0 www.hibiscusflorals.com Gigabot/2.0/gigablast.com/spider.html -
2006-03-11 03:47:34 66.154.103.119 - 216.89.218.168 80 GET /larger_image.asp PID=215 200 0 0 299 125 HTTP/1.0 www.hibiscusflorals.com Gigabot/2.0/gigablast.com/spider.html -
2006-03-11 03:50:37 66.154.103.119 - 216.89.218.168 80 GET /larger_image.asp PID=195 200 0 0 299 31 HTTP/1.0 www.hibiscusflorals.com Gigabot/2.0/gigablast.com/spider.html -
2006-03-11 07:47:05 66.154.103.125 - 216.89.218.168 80 GET /robots.txt - 200 0 400 285 78 HTTP/1.0 www.hibiscusflorals.com Gigabot/2.0/gigablast.com/spider.html -
2006-03-11 07:47:05 66.154.103.119 - 216.89.218.168 80 GET /larger_image.asp PID=219 200 0 0 299 109 HTTP/1.0 www.hibiscusflorals.com Gigabot/2.0/gigablast.com/spider.html -

The spider in this case actually belongs to a search engine called Gigablast, and is appropriately named the Gigabot. The Gigabot only crawled pages and files as other search engines have, and made no attempts whatsoever to access files and scripts of a known malicious nature.

Gigablast is a "Tier 2" search engine that has over 1,000,000,000 pages indexed as of the date of this article (March 13, 2006.) While it is not on the same level in terms of popularity as the Big 3 of Yahoo!, MSN, and Google, it has indexed a significantly large portion of the web, and can be useful for some searches. In particular, Gigablast has implemented an "Giga bits" feature whereby alternate searches are suggested based on the user's original query in order to help narrow the query down and provide greater relevancy.

I conducted additional research and discovered that some IP addresses from the 66.154.* IP block do resolve to gigablast.com e.g.:

Conclusion - The Gigabot is Safe

As you may well have gathered by now, the Gigabot is a perfectly safe spider that acts and operates in the same manner as other search engine spiders operate. There is no reason at this time to block the 66.154.* IP range that the bot uses; if anything, webmasters would gain from the potential free traffic that Gigablast would generate for their websites as the result of the Gigabot's efforts.

Adam Senour is a freelance web designer based out of the Greater Toronto Area. His latest project is Search Engine Friendly Layouts, a series of tableless layouts using CSS that load a website's content area first and foremost.