Discussing SEO with clients is no easy task but educating them about what is required to get them rankings is extremely important in order to set the right expectations. Individuals working in SEO must understand how Google and other search engines crawl, index and rank websites to be able to educate the client accordingly.
It is hard to know how difficult achieving rankings with a new search engine optimization project will be because so many factors play into it and Google is always adjusting its algorithm. By understanding search engines you can help clients understand how difficult an SEO’s job is at times.
So how does Google’s algorithm work?
The Anatomy of a Search Engine:
This is a basic breakdown of a much more complex process.
- A URL Server sends over lists of URLs to be crawled.
- The crawlers download the web pages then send them to the store server. The store server compresses and stores the web pages.
- Every web page is given an associated ID number called a docID then sent to the indexer.
- The indexing function is performed by the indexer and the sorter.
- All of the documents (webpages) are converted into a set of word occurrences called hits. Each hit records the word, position in document and other variables.
- The indexer sends these hits into a set of "buckets", creating a partial index.
- The indexer separates out all of the links in every web page and keeps important information about them in another file. This file contains information about where each link points from and to, and the text of the link.
- The links database is used to compute PageRanks for all the documents. The sorter takes the barrels, which are sorted by docID…., and resorts them by wordID to generate the inverted index…... The searcher is run by a web server and uses the inverted index and the PageRanks to answer queries.
Search engines use very complex algorithms that few people truly understand but it is still important to know how they crawl and index web pages. The basics of crawling and indexing websites are shown in the diagram below:
A much more in depth process of this can be found at - http://infolab.stanford.edu/~backrub/google.html
In laymen’s terms this is what it happens:
Say someone is looking for information about blu ray players. Google takes the billions of documents on the web and converts them into wordID’s. Then it looks for instances or patterns of the wordID’s across all of the documents and ranks the documents based on how often the wordID's appear. For example,
Blu ray player
|Blu ray players||23||134||561||765||876||1023||1348||1762|
Based off of the table above you can see that document 134 shows up in every document making it the most relevant and thus potentially ranking very high in the search results. Google has to sort through millions of documents on the web that could be talking about Ray-Ban glasses that are blue, blue ray jackets, Tampa Devil Rays jerseys that are blue, Devil Ray players, Blue Apple players and so on. In addition to looking for wordID’s, links and other on-pages factors are taken into consideration when trying to rank a site. This gets very complex really quick.
Key Takeaway: Google has to create an “association” or see that your site is relevant for a given keyword out of billions of documents. That association could take months if not years to create and it becomes even more difficult to create as you move closer to the top 10 results. *We know Google has around 200 ranking factors so it would be a good idea for a client to implement as many on-page elements as possible, to send a clear message to Google as to what the page is about, since on-page elements tend to be easier to control then the link building aspect.
Knowing the basic process is good but how much information does Google have to sort through in order to find the most relevant webpage for a search query?
What Does It Take To Get a List of Results?
- More than 1 million computing hours have gone into preparing Google’s index
- More than 1 billion searches are performed on Google every day
- More than 1,000 man years have been spent on developing Google’s algorithm
- Google’s Caffeine index contains over 100 million gigabytes
- In July of 2008 Google processed 1 trillion (1,000,000,000,000) unique URLs. What does that equate to: That is the equivalent of, “fully exploring every intersection of every road in the United States. Except it'd be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections.” Google computes this every day.
- Google’s database of indexed pages is 5 million terabytes – A stack of DVDs holding Google's index would be as tall as 3,192 Empire State buildings.
This is the task of an SEO, to try and help Google effectively sort through all of the web pages and then to help a client rank for a certain search query. This is why it takes time to potentially rank for a given keyword. As SEO’s optimize pages and build links Google is sorting through the tremendous amounts of data every day in order to better understand which pages should rank in the top positions.
As you can see a lot of time and energy has been spent by Google in order to bring back relevant results to users. It does not stop there. Since new web pages are being created daily, Google has to keep up with all of this new information. Because of that Google and other search engines need to be efficient and effective in how they manage all of this information.
Google is Always Changing
Even if a client has everything in place and is able to get to the first page of the search results, Google is always changing and tweaking its algorithm so it is an uphill battle.
Google made over 500 changes to their algorithm last year and Matt Cutts has stated,
Overall, our advice for publishers continues to be to focus on delivering the best possible user experience on your websites……This change is just one of the over 500 improvements we expect to roll out to search this year.
The news came on January 19th when Google announced another algorithm change.
While the majority of these changes are small, a number of big changes took place last year that affected nearly 50% of all search queries.
For an up to date timeline of Google’s changes over the years check out SEOmoz.org. Since Google is always changing the way they rank sites it makes an SEO’s job that much more difficult. Keeping up to date on the latest changes and industry news is never ending for an SEO.
Obviously there are many more factors to consider when getting into an SEO campaign but I have found that helping clients recognize how much work it takes to get sites to rank can improve the overall relationship in a positive way.
Stats above can be found at: