PageRank-Type Algorithm From the 1940s Discovered 108
KentuckyFC writes "The PageRank algorithm (pdf) behind Google's success was developed by Sergey Brin and Larry Page in 1998. It famously judges a page to be important if it is linked to by other important pages. This circular definition is the basis of an iterative mechanism for ranking pages. Now a paper tracing the history of iterative ranking algorithms describes a number of earlier examples. It discusses the famous HITS algorithm for ranking web pages as hubs and authorities developed by Jon Kleinberg a few years before PageRank. It also discusses various approaches from the 1960s and 70s for ranking individuals and journals based on the importance of those that endorse them. But the real surprise is the discovery of a PageRank-type algorithm for ranking sectors of an economy based on the importance of the sectors that supply them, a technique that was developed by the Harvard economist Wassily Leontief in 1941."
Re:Patent? (Score:5, Informative)
So it could be used as previous art to invalidate Google's patent?
From my read of the linked article it seems that Sergey and Larry cited the previous art in their publications. So it looks like there was no plagiarism, just building a new idea using the tools provided by an earlier idea.
Re:Patent? (Score:4, Informative)
No, since the one from 1941 didn't say "on the internet" or "with a computer".
Re:Good advice for all developers (Score:3, Informative)
They could have used one of the other search engines in existence in 1997-98, like Altavista or Lycos or Magellan or WebCrawler or Yahoo or Excite or Hotbot.
Re:linearity (Score:5, Informative)
The algorithm is not at all linear in the effect of inbound links. Two inbound links don't have the same effect, instead their effects are first weighted by the PageRank of each node of origin.
Now the distribution of PageRank among nodes is approximately power-law distributed on the web. Intuitively, this means that among all inbound links of a particular node, when that number is high, then 99% have practically no effect on the rank of that node, exactly as you probably thought in the first place. More precisely, you can expect a pareto (or similar) distribution for the influence of incoming nodes, which is not unexpected since these sorts of distributions occur a lot in social sciences.
That said, the PageRank algo is actually linear, but only in the sense of being a linear operator on weight distributions. If you normalize the weights after each iteration, the algo is actually affine (on normalized distributions) rather than linear.
Re:linearity (Score:4, Informative)
The reason why PageRank 'has to be' linear is essentially mathematical; treating importance as a linear function of the importance of inbound links means that the core equations that need to be solved to determine importance are linear and the answer can be found with (essentially) one huge matrix inversion. If you make importance nonlinear then the equations being solved become computationally infeasible.
What's interesting to me is how close the connections are between PageRank and the classic light transfer/heat transfer equations that come up in computer graphics' radiosity (see James Kajiya's Rendering equation); I wonder if there's a reasonable equivalent of 'path tracing' (link tracing?) for computing page ranks that avoids the massive matrix inversions of the basic PageRank algorithm.
Re:linearity (Score:3, Informative)
Pagerank isn't just a citation count; it's defined recursively, such that a link from a page with a high pagerank is worth more than a link from a page with low pagerank. Similarly, a link from a page with many outlinks is worth less than a link from a page with the same rank but few outlinks.
It does turn out to be more of a popularity contest than a quality metric, though. I think you're absolutely right about that.
Re:Good advice for all developers (Score:3, Informative)
Re:who says it was rediscovered in 2010? (Score:3, Informative)
Dude; it's just Jacobi iteration applied to a particular, reasonably intuitive model. This isn't to knock it -- just to say that it was probably easier to reinvent it than to find some obscure paper --- especially one which probably isn't in the electronic databases.