Forgot your password?
typodupeerror
Science Technology

Turning Data Science Into a Spectator 'Sport' 19

Posted by Soulskill
from the instant-replay-not-necessary dept.
vu1986 writes "Kaggle has a 'predictive-modeling competition platform that makes public the competitors in invite-only private competitions. Think of it like watching a major tournament in golf or tennis, where you can watch the best in the world shoot it out to see whose algorithms are king. Kaggle's tagline is "We're making data science a sport." Maybe now it can make data science a spectator sport.'"
This discussion has been archived. No new comments can be posted.

Turning Data Science Into a Spectator 'Sport'

Comments Filter:
  • by Okian Warrior (537106) on Wednesday September 12, 2012 @04:33PM (#41316609) Homepage Journal

    I've entered a couple of Kaggle competitions, but I'm 'kinda put off by the opaque results.

    After the first one ended (predict HIV progression [kaggle.com]), the released full dataset indicated that the data had been sorted before it was separated into train and test sets. IOW, after being sorted by length, all the short sequences were put into the training set, and the longer ones into the test set. This mistake may have invalidated the competition, and I strongly suspect it would have invalidated any paper written about the results.

    More recently, the organizers of one competition [kaggle.com] stated flatly in the forums that they would release the entire data set once the competition had ended, but then didn't. I inquired about this, and a Kaggle data scientist replied saying "we almost never release the test data".

    I'm not sure that Kaggle [kaggle.com] is all that scientific. If the full dataset can't be examined after the competitions close, there's no way to verify the results.

"Look! There! Evil!.. pure and simple, total evil from the Eighth Dimension!" -- Buckaroo Banzai

Working...