Search Engine 101
What's a Search Engine? Oohs.
It is a program designed to do basically three things.
1. Gathers copies of web pages from the world wide web.
2. Stores the copies of web pages in its index or database.
3. Looks up its huge database of web pages to find matches to
search query, and ranks them in order of relevance.
For example if you go to Google search and type in the words
'search engine' and hit the enter button. Immediately, the
Google Search Engine starts to query its huge database and try
to find web pages that matches the search term 'search engine',
and ranks them according to relevance.
Ranking according to relevance is determined by the Search
Engine algorithm (specific requirements). Every Search Engine
has its own peculiar algorithm.
Basically there are 2 types of Search Engines, human edited
directories and crawler-based Search Engines. The operations of
these 2 types of Search Engines are different.
What is a human edited directory?
As the term suggests, a human edited directory depends on humans
to review, approve and compile its listings of web pages. The
Yahoo Directory and Open Directory Project (DMOZ) are examples
of human edited directories. You submit a short description of
your website to the directory. Search results only show matches
in the descriptions submitted. Changing your web pages does not
have any effect on listing in a directory.
What is a crawler-based Search Engine?
First and foremost, let's understand what's a crawler? It is one
component of a Search Engine that automatically 'crawled the
web' to gather listings. The Search Engine 'crawler', which is
also known as 'spider' or 'robot' visits a web page, reads it,
and follows links to other web pages. It gathers copies of web
pages and stores them in the Search Engines index (database).
When people type in a search term the Search Engine looks up its
database of web pages, and uses its specific algorithm (specific
requirements) to determine which pages closely match the search
term.
Crawler-based Search Engines, gathers and creates listing
automatically. New web pages are automatically crawled and
indexed, and changes made to existing web pages are also
automatically updated. If you make changes to your web pages,
the crawlers will find these changes, which in turn may affect
the ranking of your web pages.
For a crawler based Search Engine, it is the crawler that does
the work of gathering and compiling the listing. Google is one
example of the many crawler-based Search Engines.
Practically speaking, you don't have to submit your web page to
Google. Google's crawler, Googlebot will find your web pages. In
fact, Google's crawler does a more efficient job of finding and
indexing web pages. Just for peace of mind, however, you could
submit your web page to Google to be indexed and ranked.
For smaller crawler-based search Engines, you need to submit
your web pages for listing. Upon confirmation of your
submission, crawlers are sent out to index your web pages.
Human edited directory like DMOZ, which is a free listing,
depends on humans to review, approve and compile its listings of
web pages. DMOZ is staffed by volunteer editors. There is a
massive backlog because everyday thousands of new web sites are
being submitted for listing. It used to take anywhere between 2
to 6 months to get your web site listed. It may take longer now.
The backlog of paid commercial directories is smaller compared
to DMOZ.
Recommended article for reading:
Who powers whom? Search providers chart
http://searchenginewatch.com/reports/article.php/2156401