BellKor Wins Netflix $1 Million By 20 Minutes 104
eldavojohn writes "As we discussed at the time, there was a strange development at the end of Netflix's competition in which The Ensemble passed BellKor's Pragmatic Chaos by 0.01% a mere twenty minutes after BellKor had submitted results past the ten percent mark required to win the million dollars. Unfortunately for The Ensemble, BellKor was declared the victor this morning because of that twenty-minute margin. For those of you following the story, The New York Times reports on how teams merged to form Bellkor's Pragmatic Chaos and take the lead, which sparked an arms race of teams conjoining to merge their algorithms to produce better results. Now the Netflix Prize 2 competition has been announced." The Times blog quotes Greg McAlpin, a software consultant and a leader of the Ensemble: "Having these big collaborations may be great for innovation, but it's very, very difficult. Out of thousands, you have only two that succeeded. The big lesson for me was that most of those collaborations don't work."
It was a tie... (Score:3, Interesting)
It was a tie...
In football, I can see how a 20 second difference makes the difference between winning the superbowl. In a contest like this that took thousands of man hours of some brilliant people, calling Ensemble "second place" due to a 20 second difference is just wrong. I don't know if there was a better solution, but something just seems wrong about it all.
The Rules are the Rules... (Score:5, Interesting)
I agree that Ensemble "losing" because they posted 20 minutes later is a harsh result. However, those were the rules that Netflix set forth and Ensemble, intentionally or not, was making a risky gamble by waiting until right before the deadline to submit their project. And, perhaps the "tie goes to the earlier poster" rule makes some sense because it encourages making your submission earlier that you would otherwise and not "sniping" unless you're absolutely sure your project is better than the rest. At least as far as I can understand, the rule set forth the proper tradeoff -- Ensemble got to see the score to beat (BellKor's) before it posted; however, in exchange for that, its score needed to have been better in order to win. Had Ensemble wanted the first-mover's advantage and the win in event of a tie, it could have posted earlier than BellKor. The fact that BellKor posted only 20 minutes before the end of the competition suggests that Ensemble could have easily posted earlier without compromising its entry. That is, how much significant tinkering could have possibly been done in the last half hour of this multi-year competition?
Re:Funny, I learned a different lesson... (Score:3, Interesting)
I was just about to post the very same comment. By the contest rules, the contest ends the once someone comes up with a winning solution. The fact that there were 2 solutions meeting the requirement so close together and both resulting from collaborations would rather suggest the collaborations worked really well. The other collaborations simply stopped once there was a winner. Concluding from this that collaborations don't work would be like concluding that the training athletes go through prior to the Olympic games doesn't work - after all from all these entrants training hard only 1 wins in each event.
Re:Anonymous Coward (Score:3, Interesting)
The contest has been going on for three and a half years, and the winning team of seven will be splitting a cool million, which gives each person just under $145k, minus taxes. Now, I don't know how much time these guys spent on it, but even if they only worked a year's worth of regular work hours over the 3.5 years, $145k per year each for seven developers is a pretty damn good bargain from Netflix's perspective for what they got (not just the new algorithm, but a lot of good PR and buzz).
I'm not saying the BellKor guys got the shaft; they were certainly compensated (not just monetarily; I'm sure their employability went up as well), and I'm sure a big part of their desire to compete was the challenge itself. But I'd bet that Netflix would've had to pay quite a bit more.
And it's not like the BellKor team did all the work; all the other teams did some of the same work independently. I imagine many (most?) of them didn't stand a chance, but let's just throw out a conservative number and say the top 5% of teams managed to improve on Netflix's existing algorithm (even if not by 10%). It's conceivable to believe that an in-house team of paid developers/researchers would end up doing an analogous iterative process, achieving smaller gains, eventually reaching the 10% goal. Depending on Netflix's hiring skills, it's possible they wouldn't reach a 10% increase without many more man-years of work.
This contest was a very smart move on Netflix's part: their only real downside is that their -- self-imposed -- competition terms will allow the contest participants to competitively license their implementations to other companies.
I think it's a gloss on prizes as innovation-spurs (Score:3, Interesting)
I think he's pointing to one of the inefficiencies of prize systems as a way to spur innovation. Thousands of people tried, spending tens or hundreds of thousands of work-hours and other resources, and only a fraction got "winning results" (yes, according to the arbitrary way that winning was defined). But the point is that the prize probably resulted in a very inefficient use of resources. We could hypothesize that the same result might have been achieved with only 25% of the resources spent on the prize - for example, by making the cost of entry non-zero, you could have eliminated teams with no chance of winning from participating.
Basically prize systems benefit from people's inability to accurately assess their real chances of winning - or put another way, prize systems free ride off of people's self-delusion.
Of course there are other factors to be considered, e.g., what would those wasted resources have gone to if they were not being used for the competition, perhaps there are incidental rewards to those resources having been used, perhaps people competed for reasons other than simply winning the prize, etc.
Re:I think it's a gloss on prizes as innovation-sp (Score:4, Interesting)
This doesn't work. If you make the entry cost nonzero, you'll be much less efficient at doing *science*. Remember, the journey is much more important than the result. The benefits to society in disseminating knowledge of data mining technologies and good datasets largely dwarfs the knowledge of the winning entry (think Metcalfe's law).
Re:The Objective (Score:3, Interesting)
Well it might not affect the average prediction as it relates to everybody else. However, from a user's perspective, the whole point of the system is to try an figure out what my taste is for movies based on how I rated those movies, match it up to other people's ratings, and try to predict what other movies I'd like. You can't statistically average out my ratings, as my ratings are the only significant factor on one side of the equation. There are no other users you can use to balance out what my tastes are. It has to go by my ratings, and if my ratings are anomalous, the results are going to suffer.
Your lottery analogy is pointless, because it demonstrates a different issue. There, the 6 actual numbers against which we are rating your submissions is a factual matter. They aren't affected by your feeling and interpretations. They are going to be the same 6 numbers, no matter whether you just got a promotion at work or your spouse was just murdered. However, your rating of movie X would probably be different after the promotion than it would be after the murder of your spouse (we're assuming you actually liked your spouse and didn't hire a hitman).