Search Engine 101

Search Engine 101

What's a Search Engine? Oohs. It is a program designed to do basically three things. 1. Gathers copies of web pages from the world wide web. 2. Stores the copies of web pages in its index or database. 3. Looks up its huge database of web pages to find matches to search query, and ranks them in order of relevance. For example if you go to Google search and type in the words 'search engine' and hit the enter button. Immediately, the Google Search Engine starts to query its huge database and try to find web pages that matches the search term 'search engine', and ranks them according to relevance. Ranking according to relevance is determined by the Search Engine algorithm (specific requirements). Every Search Engine has its own peculiar algorithm. Basically there are 2 types of Search Engines, human edited directories and crawler-based Search Engines. The operations of these 2 types of Search Engines are different. What is a human edited directory? As the term suggests, a human edited directory depends on humans to review, approve and compile its listings of web pages. The Yahoo Directory and Open Directory Project (DMOZ) are examples of human edited directories. You submit a short description of your website to the directory. Search results only show matches in the descriptions submitted. Changing your web pages does not have any effect on listing in a directory. What is a crawler-based Search Engine? First and foremost, let's understand what's a crawler? It is one component of a Search Engine that automatically 'crawled the web' to gather listings. The Search Engine 'crawler', which is also known as 'spider' or 'robot' visits a web page, reads it, and follows links to other web pages. It gathers copies of web pages and stores them in the Search Engines index (database). When people type in a search term the Search Engine looks up its database of web pages, and uses its specific algorithm (specific requirements) to determine which pages closely match the search term. Crawler-based Search Engines, gathers and creates listing automatically. New web pages are automatically crawled and indexed, and changes made to existing web pages are also automatically updated. If you make changes to your web pages, the crawlers will find these changes, which in turn may affect the ranking of your web pages. For a crawler based Search Engine, it is the crawler that does the work of gathering and compiling the listing. Google is one example of the many crawler-based Search Engines. Practically speaking, you don't have to submit your web page to Google. Google's crawler, Googlebot will find your web pages. In fact, Google's crawler does a more efficient job of finding and indexing web pages. Just for peace of mind, however, you could submit your web page to Google to be indexed and ranked. For smaller crawler-based search Engines, you need to submit your web pages for listing. Upon confirmation of your submission, crawlers are sent out to index your web pages. Human edited directory like DMOZ, which is a free listing, depends on humans to review, approve and compile its listings of web pages. DMOZ is staffed by volunteer editors. There is a massive backlog because everyday thousands of new web sites are being submitted for listing. It used to take anywhere between 2 to 6 months to get your web site listed. It may take longer now. The backlog of paid commercial directories is smaller compared to DMOZ. Recommended article for reading: Who powers whom? Search providers chart http://searchenginewatch.com/reports/article.php/2156401