Google Search Algorithm – Google Dance – Crawling/Indexing and more
Google has a comprehensive and highly developed technology, a straightforward interface and a wide-ranging array of search tools which enable the users to easily access a variety of information online. Google users can browse the web and find information in various languages, retrieve maps, stock quotes and read news, search for a long lost friend using the phonebook listings available on Google for all of US cities and basically surf the 3 billion odd web pages on the internet!
Google boasts of having world’s largest archive of Usenet messages, dating all the way back to 1981. Google’s technology can be accessed from any conventional desktop PC as well as from various wireless platforms such as WAP and i-mode phones, handheld devices and other such Internet equipped gadgets.
The web search technology offered by Google is often the technology of choice of the world’s leading portals and websites. It has also benefited the advertisers with its unique advertising program that does not hamper the web surfing experience of its users but still brings revenues to the advertisers.
Google’s Web Search Technology
When you search for a particular keyword or a phrase, most of the search engines return a list of page in order of thenumber of times the keyword or phrase appears on the website. Google web search technology involves the use of its indigenously designed PageRank Technology and hypertextmatching analysis which makes several instantaneous calculations undertaken
without any human intervention. Google’s structural design also expands simultaneously as the
internet expands.
PageRank technology
PageRank technology involves the use of an equation which comprises of millions of variables and terms and determines a
factual measurement of the significance of web pages and is calculated by solving an equation of 500 million variables and
more than 3 billion terms. Unlike some other search engines, Google does not calculate links but utilizes the extensive link structure of the web as an organizational tool. When the link to a Page, lets say Page B is clicked from a Page A, then that click is attributed as a vote towards Page B on behalf of Page A.
Quintessentially, Google calculates the importance of a page by the number of such ‘votes’ it receives. Not only that, Google also assesses the importance of the pages that are involved in the voting process. Consequently, pages that are themselves ahead in ranking and are important in that way also help to make other pages important. One thing to note here is that Google’s technology does not involve human intervention in anyway and uses the inherent intelligence of
the internet and its resources to determine the ranking and importance of any page.
Hypertext-Matching Analysis: Unlike its conventional counterparts, Google is a search engine which is hypertextbased. This means that it analyzes all the content on each web page and factors in fonts, subdivisions, and the exact positions of all terms on the page. Not only that, Google also evaluates the content of its nearest web pages. This policy of not disregarding any subject matter pays off in the end and enables Google to return results that are closest to user
queries..
Query-handling – The Google Way Google has a very simple 3-step procedure in handling a
query submitted in its search box.
1. When the query is submitted and the enter key is
pressed, the web server sends the query to the index
servers. Index server is exactly what its name suggests; it consists of an index much like the index of a book which displays where is the particular page
containing the queried term is located in the entire
book.
2. After this, the query proceeds to the doc servers, and these servers actually retrieve the stored documents. Page descriptions or “snippets” are then generated to suitably describe each search result.
3. These results are then returned to the user in less than a second!
The Google Dance
Approximately once a month, Google update their index by recalculating the Pageranks of each of the web pages that they have crawled. The period during the update is known as the Google dance. Because of the nature of PageRank, the calculations need to
be performed about 40 times and, because the index is so large, the calculations take several days to complete. During this period, the search results fluctuate; sometimes minuteby minute. It is because of these fluctuations that the term, Google Dance, was coined. The dance usually takes place
sometime during the last third of each month.
Google has two other servers that can be used for searching. The search results on them also change during the monthly update and they are part of the Google dance. For the rest of the month, fluctuations sometimes occur in the search results, but they should not be confused with the actual dance. They are due to Google’s fresh crawl and to
what is known “Everflux”. Google has two other searchable servers apart from www.google.com. They are www2.google.com and www3.google.com. Most of the time, the results on all 3 servers are the same, but during the dance, they are different.
For most of the dance, the rankings that can be seen on www2 and www3 are the new rankings that will transfer to www when the dance is over. Even though the calculations are done about 40 times, the final rankings can be seen from very early on. This is because, during the first few iterations, the calculated figures merge to being close to their final
figures.
You can see this with the Pagerank Calculator by checking the Data box and performing some calculations. After the first few iterations the search results on www2 and www3 may still
change, but only slightly.
During the dance, the results from www2 and www3 will sometimes show on the www server, but only briefly. Also, new results on www2 and www3 can disappear for short periods. At the end of the dance, the results on www will match those on www2 and www3. This Google Dance Tool allows you to check your rankings on www, www2 and www3 and on all 9 datacenters
simultaneously.