Search Engine Basics

Search Engine Basics

Just about every major search engine has basically 3 parts. The first is the spider, otherwise called a robot. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes and updates. (If a site is updated often and is well marketed, this will happen much more often, sometimes even every day) Everything the spider finds goes into the second part of a search engine, the index. The index, sometimes called the database, is like a giant library containing a copy of every web page that the spider finds. If a web page is different or appears to have changes, then the site will be re-indexed and this "book" is updated with new information. Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until the new information is indexed, it is not available to those searching with the search engine. The third, and most sophisticated part of a search engine is the ranking software (sometimes referred to as the algo or algorithm). This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. All search engines have the basic parts described above, but there are differences in how these parts are tuned. That is why the same search on different search engines often produces different results.