PageRank-Type Algorithm From the 1940s Discovered 108

Posted by samzenpus on Wednesday February 17, 2010 @07:55PM from the oldest-google-juice dept.

KentuckyFC writes "The PageRank algorithm (pdf) behind Google's success was developed by Sergey Brin and Larry Page in 1998. It famously judges a page to be important if it is linked to by other important pages. This circular definition is the basis of an iterative mechanism for ranking pages. Now a paper tracing the history of iterative ranking algorithms describes a number of earlier examples. It discusses the famous HITS algorithm for ranking web pages as hubs and authorities developed by Jon Kleinberg a few years before PageRank. It also discusses various approaches from the 1960s and 70s for ranking individuals and journals based on the importance of those that endorse them. But the real surprise is the discovery of a PageRank-type algorithm for ranking sectors of an economy based on the importance of the sectors that supply them, a technique that was developed by the Harvard economist Wassily Leontief in 1941."

PageRank-Type Algorithm From the 1940s Discovered

This discussion has been archived. No new comments can be posted.

Search 108 Comments Log In/Create an Account

Comments Filter:

Re:Patent? (Score:5, Informative)

by Meshach ( 578918 ) writes: on Wednesday February 17, 2010 @08:08PM (#31178512)

So it could be used as previous art to invalidate Google's patent?
From my read of the linked article it seems that Sergey and Larry cited the previous art in their publications. So it looks like there was no plagiarism, just building a new idea using the tools provided by an earlier idea.

Re:Patent? (Score:4, Informative)

by nedlohs ( 1335013 ) writes: on Wednesday February 17, 2010 @08:11PM (#31178542)

No, since the one from 1941 didn't say "on the internet" or "with a computer".

Re:Good advice for all developers (Score:3, Informative)

by commodore64_love ( 1445365 ) writes: on Wednesday February 17, 2010 @08:34PM (#31178712) Journal

They could have used one of the other search engines in existence in 1997-98, like Altavista or Lycos or Magellan or WebCrawler or Yahoo or Excite or Hotbot.

Re:linearity (Score:5, Informative)

by martin-boundary ( 547041 ) writes: on Wednesday February 17, 2010 @08:44PM (#31178770)

What really shocked me when someone first described page rank to me was that it was linear. I felt that this just had to be wrong, because it didn't seem right for a *million* inbound links to have a *million* times the effect compared to a single inbound link. Maybe this is just the elitist snob in me,

The algorithm is not at all linear in the effect of inbound links. Two inbound links don't have the same effect, instead their effects are first weighted by the PageRank of each node of origin.
Now the distribution of PageRank among nodes is approximately power-law distributed on the web. Intuitively, this means that among all inbound links of a particular node, when that number is high, then 99% have practically no effect on the rank of that node, exactly as you probably thought in the first place. More precisely, you can expect a pareto (or similar) distribution for the influence of incoming nodes, which is not unexpected since these sorts of distributions occur a lot in social sciences.
That said, the PageRank algo is actually linear, but only in the sense of being a linear operator on weight distributions. If you normalize the weights after each iteration, the algo is actually affine (on normalized distributions) rather than linear.

Re:linearity (Score:4, Informative)

by Shaterri ( 253660 ) writes: on Wednesday February 17, 2010 @08:51PM (#31178844)

The reason why PageRank 'has to be' linear is essentially mathematical; treating importance as a linear function of the importance of inbound links means that the core equations that need to be solved to determine importance are linear and the answer can be found with (essentially) one huge matrix inversion. If you make importance nonlinear then the equations being solved become computationally infeasible.
What's interesting to me is how close the connections are between PageRank and the classic light transfer/heat transfer equations that come up in computer graphics' radiosity (see James Kajiya's Rendering equation); I wonder if there's a reasonable equivalent of 'path tracing' (link tracing?) for computing page ranks that avoids the massive matrix inversions of the basic PageRank algorithm.

Re:linearity (Score:3, Informative)

by j1m+5n0w ( 749199 ) writes: on Wednesday February 17, 2010 @09:04PM (#31178916) Homepage Journal

it didn't seem right for a *million* inbound links to have a *million* times the effect compared to a single inbound link
Pagerank isn't just a citation count; it's defined recursively, such that a link from a page with a high pagerank is worth more than a link from a page with low pagerank. Similarly, a link from a page with many outlinks is worth less than a link from a page with the same rank but few outlinks.
It does turn out to be more of a popularity contest than a quality metric, though. I think you're absolutely right about that.

Re:Good advice for all developers (Score:3, Informative)

by twosat ( 1414337 ) writes: on Wednesday February 17, 2010 @09:46PM (#31179206)

Reminds me of the invention of Turbo Codes in the early Nineties for forward error correction for communication networks. It was later discovered that in the early Sixties, low density parity check (LDPC) coding was developed that performed a similar function but was not used because of the lack of computer power and memory back then. The LDCP patents had expired by then, so now there are two technologies doing the same thing in a different way but one is patent-free. In a similar vein, I read some years ago of a company in the UK who search through expired and current patents looking for inventions that meet their customers' needs. They would often find solutions in a totally different field to the area being researched and a lot of it was stuff that was ahead of its time and its technology had been abandoned.

Re:who says it was rediscovered in 2010? (Score:3, Informative)

by TerranFury ( 726743 ) writes: on Thursday February 18, 2010 @01:27AM (#31180598)

Dude; it's just Jacobi iteration applied to a particular, reasonably intuitive model. This isn't to knock it -- just to say that it was probably easier to reinvent it than to find some obscure paper --- especially one which probably isn't in the electronic databases.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

PageRank-Type Algorithm From the 1940s Discovered 108

PageRank-Type Algorithm From the 1940s Discovered More Login

PageRank-Type Algorithm From the 1940s Discovered

Re:Patent? (Score:5, Informative)

Re:Patent? (Score:4, Informative)

Re:Good advice for all developers (Score:3, Informative)

Re:linearity (Score:5, Informative)

Re:linearity (Score:4, Informative)

Re:linearity (Score:3, Informative)

Re:Good advice for all developers (Score:3, Informative)

Re:who says it was rediscovered in 2010? (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot