Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Social Networks The Internet Science

Using Twitter Data To Approximate a Telephone Survey 68

cremeglace writes "A team led by a computer scientist at Carnegie Mellon University has used text-analysis software to detect tweets pertaining to various issues — such as whether President Barack Obama is doing a good job — and measure the frequency of positive or negative words ranging from 'awesome' to 'sucks.' The results were surprisingly similar to traditional surveys. For example, the ratio of Twitter posts expressing either positive or negative sentiments about President Obama produced a 'job approval rating' that closely tracked the big Gallup daily poll across 2009. The analysis also produced classic economic indicators like consumer confidence." By averaging several days' worth of tweets on presidential job approval, the researchers got results that correlated 79% with daily Gallup polling. Lead researcher Noah Smith said, "The results are noisy, as are the results of polls. Opinion pollsters have learned to compensate for these distortions, while we're still trying to identify and understand the noise in our data. Given that, I'm excited that we get any signal at all from social media that correlates with the polls." Here is CMU's press release.
This discussion has been archived. No new comments can be posted.

Using Twitter Data To Approximate a Telephone Survey

Comments Filter:
  • 79% is not fantastic (Score:5, Informative)

    by Ed Peepers ( 1051144 ) on Tuesday May 11, 2010 @09:43PM (#32177932)

    I've collaborated on research using Twitter traffic as a predictor so I applaud their efforts, but a 79% correlation with telephone responses is not as high as it sounds. For example, the minimum acceptable correlation for interrater reliability is typically 80%.

    Put simply, the Twitter data can only account for about two thirds of the variation in phone responses. That's useful but there's still a lot of unexplained variance -- we have a long way to go.

  • by Anonymous Coward on Tuesday May 11, 2010 @10:32PM (#32178258)

    Inter-rater reliability is sometimes taken as the correlation between two raters' scores. Reliability is a different concept from variance explained, which is equal to the square of the correlation. Twitter can predict 0.79 * 0.79 = 62% of the variance in phone responses.

All seems condemned in the long run to approximate a state akin to Gaussian noise. -- James Martin

Working...