Crawler-based search engines are made up of three major elements:
- The spider/crawler
- Not a real spider! A search engine spider is the automated program that visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled". The spider returns to the site on a regular basis to look for changes.
- The index
- This is like a giant catalogue or inventory of websites containing a copy of every web page that the spider finds. If a web page changes, then this catalogue is updated with new information. Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until it is indexed -- added to the index -- it is not available to those searching with the search engine.
- Search engine software
- This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.
So, how do crawler-based search engines go about determining relevancy, when confronted with hundreds of millions of web pages to sort through? They follow a set of rules, known as an algorithm. After applying this algorithm to their index of a site, a search engine comes up with a list of the most relevant results according to the search conducted. This algorithm differs between engines, which is why different search engines may produce different results for the same query.
Exactly how a particular search engine's algorithm works is a closely guarded secret (even we don't know it!), but some general rules are clear which are used in website optimisation and what Quirk can start helping you with today. Contact us to find out how we do it!