PageRank is the algorithm used by the
Google search engine,
originally formulated by Sergey Brin and Larry Page in their paper
The Anatomy of a Large-Scale Hypertextual Web Search Engine. It is
based on the premise, prevalent in the world of academia, that the
importance of a research paper can be judged by the number of citations
the paper has from other research papers. Brin and Page have simply
transferred this premise to its web equivalent: the importance of a web
page can be judged by the number of hyperlinks pointing to it from other
web pages |
It may look daunting to non-mathematicians, but the PageRank algorithm
is in fact elegantly simple and is calculated as follows:
- PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
where PR(A) is the PageRank of a page A
PR(T1) is the PageRank of a page T1
C(T1) is the number of outgoing links from the page T1
d is a damping factor in the range 0 < d < 1, usually set to 0.85
The PageRank of a web page is therefore calculated as a sum of the
PageRanks of all pages linking to it (its incoming links), divided
by the number of links on each of those pages (its outgoing links). |
From a search engine marketer's point of view, this means there are
two ways in which PageRank can affect the position of your page on Google:
- The number of incoming links. Obviously the more of these the
better. But there is another thing the algorithm tells us: no incoming
link can have a negative effect on the PageRank of the page it points
at. At worst it can simply have no effect at all.
- The number of outgoing links on the page which points at your
page. The fewer of these the better. This is interesting: it means
given two pages of equal PageRank linking to you, one with 5 outgoing
links and the other with 10, you will get twice the increase in PageRank
from the page with only 5 outgoing links.
At this point we take a step back and ask ourselves just how important
PageRank is to the position of your page in the Google search results.
The next thing we can observe about the PageRank algorithm is that it
has nothing whatsoever to do with relevance to the search terms queried.
It is simply one single (admittedly important) part of the entire Google
relevance ranking algorithm.
Perhaps a good way to look at PageRank is as a multiplying factor,
applied to the Google search results after all its other computations have
been completed. The Google algorithm first calculates the relevance of
pages in its index to the search terms, and then multiplies this relevance
by the PageRank to produce a final list. The higher your PageRank
therefore the higher up the results you will be, but there are still many
other factors related to the positioning of words on the page which must
be considered first |
Well, not entirely. The PageRank algorithm is very cleverly balanced.
Just like the conservation of energy in physics with every reaction,
PageRank is also conserved with every calculation. For instance, if a page
with a starting PageRank of 4 has two outgoing links on it, we know that
the amount of PageRank it passes on is divided equally between all of its
outgoing links. In this case 4 / 2 = 2 units of PageRank is passed on to
each of 2 separate pages, and 2 + 2 = 4 - so the total PageRank is
preserved! Note: There are scenarios where you may find that
total PageRank is not conserved after a calculation. PageRank itself is
supposed to represent a
probability
distribution, with the individual PageRank of a page representing the
likelihood of a 'random surfer' chancing upon it.).
On a much larger scale, supposing Google's index contains a billion
pages, each with a PageRank of 1, the total PageRank across all pages is
equal to a billion. Moreover, each time we recalculate PageRank, no matter
what changes in PageRank may occur between individual pages, the total
PageRank across all one billion pages will still add up to a billion.
Firstly, this means that although we may not be able to change the
total PageRank across all pages, by strategic linking of pages within our
site, we can affect the distribution of PageRank between pages. For
instance, we may want most of our visitors to come into the site through
our home page. We would therefore want our home page to have a higher
PageRank relative to other pages within the site. We should also recall
that all of the PageRank of a page is passed on and divided equally
between each of the outgoing links on a page. We would therefore want to
keep as much combined PageRank as possible within our own site without
passing it on to external sites and losing its benefit. This means we
would want any page with lots of external links (ie. links to other
people's web sites) to have a lower PageRank relative to other pages
within the site to minimise the amount of PageRank which is 'leaked' to
external sites. Bear in mind also our earlier statement, that PageRank is
simply a multiplying factor applied once Google's other calculations
regarding relevance have already been calculated. We would therefore want
our more keyword-rich pages to also have a higher relative PageRank.
Secondly, if we assume that every new page in Google's index begins its
life with a PageRank of 1, there is a way we can increase the combined
PageRank of pages within our site - by increasing the number of pages! A
site with 10 pages will start life with a combined PageRank of 10 which is
then redistributed through its hyperlinks. A site with 12 pages will
therefore start with a combined PageRank of 12. We can thus improve the
PageRank of our site as a whole by creating new content (ie. more pages)
and then control the distribution of that combined PageRank through
strategic interlinking between the pages.
And this is the purpose of the PageRank Calculator - to create a model
of the site on a small scale including the links between pages, and see
what effect the model has on the distribution of PageRank. |
It's very simple really. Start by typing in the number of interlinking
pages you wish to analyse and hit 'Submit'. I have confined this number to
just twenty pages to ease server resources. Even so, this should give a
reasonable indication of how strategic linking can affect the PageRank
distribution. Next, for ease of reference once the calculation has been
performed, provide a label for each page (eg. 'Home Page', 'Links Page',
'Contact Us Page', etc) and again hit 'Submit'.
Finally, use the list boxes to select which pages each page links to.
You can use CTRL and SHIFT to highlight multiple selections.
You can also use this screen to change the initial PageRanks of each
page. For instance, if one of your pages is supposed to represent Yahoo,
you may wish to raise its initial PageRank to, say, 3. However, in actual
fact, starting PageRank is irrelevant to its final computed value. In
other words, even if one page were to start with a PageRank of 100, after
many iterations of the equation (see below), the final computed PageRank
will converge to the same value as it would had it started with a PageRank
of only 1!
Finally you can play around with the damping factor d, which defaults
to 0.85 as this is the value quoted in Brin and Page's research paper. |
Ever heard of the Google 'Dance'? You can see this demonstrated by
looking at the differing results sets produced on
www.google.com,
www2.google.com and
www3.google.com. If
you study these results closely you will see that they change very
slightly from day to day, and in particular during the period once a month
when Google updates its index. One of the reasons for this apparent
dancing of results is because Google does not simply calculate the
PageRank once for each page. After it has calculated the PageRank for the
first time it will then put the resulting PageRanks back into the PageRank
algorithm and calculate again. Google will go through this process of
iteration many times before the results settle down to their 'true'
values. When it has been completed, the results will then appear on the
'official' www.google.com
domain.
(NB: While Google is updating its index, the updates also occur at
different times across its various data centres.
The PageRank Calculator defaults to 20 iterations, although you can
increase this number should you choose. For a model of around 20 pages, 20
iterations is sufficient to see the PageRanks honing in on a single 'true'
value. Google almost certainly performs many more. |