Google’s PageRank Revealed

Mon Dec 11, 2006 12:41:14 pm by Dustin
Filed under General

The American Mathematics Society was nice enough to publish David Austin’s (of Grand Valley State University) article titled How Google Finds Your Needle in the Web’s Haystack. This article explains the math behind how Google determines what it calls the PageRank of a given webpage on the Internet.I have to say that there is an uber amount of math in this article. How much you may say? Well, let me put it in perspective. I have an undergraduate degree in Computer Science. Whoopty-ding, you may say. Well most of Computer Science degrees require, well, uber amounts of math. In my case, I took 12 upper division hours of math to get my degree. What does that mean; I mean who is this guy and why is he talking about his math courses? Well David’s article requires knowledge in all of these courses to start to understand – I cannot image the math involved in coming up with problem from scratch. Here! Here! to Sergey and Larry.

Why am I telling all y’all this? Well, I felt the urge to translate David’s words from the Universal language to English (or at least my version of English).

Just like back in high school, everything is a popularity contest, but this time, you can be ugly (or even average) and people will still like you. Ambiguity is amazing. w00t.

Google’s first step in the art of making people either know your name or making you eat lunch in a phone booth is to ‘index’ all the pages on the Internet and put them in, what we call in the software world as, a graph. This is a fancy way of saying make they a circle for every webpage and an arrow to represent a link (pointing accordingly to the corresponding other circle). Hurrah, we all know how to draw know. This process is a fun way of visualizing how all the pages on the Internet play with each other. Why bother? Well I wouldn’t either, but Google does. Google’s idea is the more important pages linking to this page, the more important this page will be.

This Catch-22 is one of those fun chicken and egg problems. Yep you guessed it; somebody has to be popular to make others popular. The nice part about this party is that we know which came first this time, and it was the egg. So, we all start as pimple faced, four eyed freaks. And then one day, we get a friend and another and another. After a while, the lords of Mountain View say, hey, let’s add ‘em up. Without getting into some heavy Linear Algebra or Graph Theory, they do an iteration of their fanciness to come up with a points system of where each page stand in respect other similarly ranked pages. The high school cliques are back, but this time one can change from clique to clique – but only one clique at a time folks. About three days later, yep that is how long it takes to figure out ~25 billion pages, Google places a 0-10 value to each page. They then put the most important at the top of their searches.

Just to clearify, if a 10 point page links to a one point page, it will help the one point page much more than if a two point page links to the one point page. How much more? Well, if you figure that out, you can write a cute article like David did and become famous. So famous, I might write about you too.

I had a few questions for David about some of his findings, and he was nice enough to actually reply. Since he is classy enough to reply, I have posted them on here too. Here are my two questions.

David

I appreciate the study. It was very informative. I had a question though.

You stated that studies have shown that each page has an average of 10 links coming into it. Do you have a link such a study? It just does not seem to be true. I can look at many pages and know that there are more than 10 links to it, and they are not very highly rated sites. Is the average about 10 because most of them have zero causing the average to come down? Thanks for your time
Dustin Hi Dustin, Thanks for your post. I’m out of town right now but I’ll send you a reference when I return home. There have been a couple of interesting studies of the link structure of the web. Think about how many links a page has to other pages. Of course, there are some pages with lots of links, but there are also many others with no links at all. As you say, these bring the average down to around 10. All the best,
David

And the second:

How does this I value, which is a percentage of popularity, turn into a PageRank value (ie from 0-10) instead of some decimal with a range of [0,1)?That’s a good question, and one whose answer Google doesn’t want you to know. Google zealously guards the actual value. The published PageRank (what shows up at various sites like www.checkpagerank.com or on the Google toolbar) is only an approximation. This means there is some conversion between the actual PageRank and the published value. I’ve read that the conversion function is logarithmic, which means that an increase in one in the published PageRank (say, 5 to 6) is an increase by a factor of, perhaps, two in the real PageRank. If you look at a lot of pages, this starts to feel about right. No one who’s saying is really sure though.Thanks again for your interest.
David

And now the fun part… Tell me what you think! Come on don’t be shy.