Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Science

What Does It Mean To Be a Data Scientist? 94

Nerval's Lobster writes What is a data scientist? "To be honest, I often don't tell people I am a data scientist," writes Simon Hughes, chief data scientist of the Dice Data Science Team. "It's not that I don't enjoy my job (I do!) nor that I'm not proud of what we've achieved (I am); it's just that most people don't really understand what you mean when you say you're a data scientist, or they assume it's some fancy jargon for something else." So how do Simon and his team define "data scientist"? In this blog posting, he breaks it down along several lines: solid programming skills, a scientific mindset, and the ability to use tools are just for starters. A data scientist also needs to be a polymath with strong math skills. "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory," he writes. "Likewise, as a data scientist, I've learned to be suspicious of models that are too accurate, or individual variables that are too predictive." His points are good to keep in mind right now, with everybody throwing around buzzwords like "Big Data" without fully realizing what they mean.
This discussion has been archived. No new comments can be posted.

What Does It Mean To Be a Data Scientist?

Comments Filter:
  • by Anonymous Coward on Thursday February 12, 2015 @10:05PM (#49044167)

    Just like how 10 years ago, suddenly everyone was an "Architect" and before that you were a "Developer".

    • Re: (Score:2, Insightful)

      by Anonymous Coward
      And it means you're unemployed within 5 years.
    • I am a data scientist. It says so on my business card, which also bears the name of a large corporation. I have a hard science Ph.D. "Data Scientist" means...that I am a statistician. But that's OK, because most people with "Statistician" on their business cards are anything but.

      • Agree. My card reads, "Data Specialist," but I tell most folks who ask what I do that I'm a "computer yo-yo" and they're good with that.
    • by gweihir ( 88907 ) on Friday February 13, 2015 @02:30AM (#49045213)

      Indeed. The actual name for this is "Computer Scientist" or in some cases "Statistician". The nonsensical "Data Scientist" is just a marketing term, created solely to inflate the perceived worth of a product, that, by all credible accounts, is not very good.

      • by Xest ( 935314 )

        But isn't that the same for many other sciences especially relating to medicine for example whereby nearly all of their work is based on statistical analysis of data? I suspect given increased complexity of data sets that we could apply the same logic to many professions. Hell, even the folks at CERN are using wholly statistical methods to determine the likelihood whether their findings really were the Higgs or not, does this mean those physicists are actually just statisticians too?

        I think it's naive to th

        • by gweihir ( 88907 )

          I did not "instantly" jump to any conclusion. I had the first discussion about the name "Data Scientist" with a Statistician about 8 years ago. Also remember that "Statistician" is already specialized, a Statistician is a Mathematician specializing in statistics. Most of them can program these days. On the CS side, Computer Science has not yet started to specialize this strongly. What you would say is "I am a CS specializing in statistical analysis" or something like it.

          • by Xest ( 935314 )

            "On the CS side, Computer Science has not yet started to specialize this strongly."

            So where is your arbitrary line drawn out of interest? What would be required for a data scientist to be a data scientist? That's assuming all data involved even has any relevance to comp. sci. What if they're using data collected non-computationally also?

            The problem is you obviously like incredibly generic names, and that's great for you, but it makes it much harder for people wanting to advertise for specific roles, or to p

            • by gweihir ( 88907 )

              There are no "Data Scientists" because there is no "Data Science" other than Statistics and it already has a name. Marketing BS is not a sound reason to mess with language.

              • by Xest ( 935314 )

                Right, except there's a problem, everywhere and everyone that matters in the world of technology disagrees with you from IBM to Apple, from Facebook to Google, from Microsoft to Oracle, from MIT to Cambridge, from Harvard to Berkley, from Tim Berners Lee to Mark Zuckerberg, from Sandy Pentland to Bill Gates, from Peter Norvig to Larry Page.

                So on one hand we have some random guy on Slashdot claiming it doesn't exist, and on the other we have the who's who of technology companies, universities, technologists,

                • by gweihir ( 88907 )

                  Well, you have insulted me, insinuated that I do not know hat Statistics and Big Data is (both wrong), but do you actually have some _arguments_? Because up to now I find none at all.

                  • by Xest ( 935314 )

                    Well that's one of the unfortunate things about being the sort of person who thinks they know better than just about everyone that matters in the industry, you generally wont find arguments in anything you read because you've already decided that you're right and the whole world is wrong. You can't see what's right in front of your eyes because you don't want to.

                    Instead you now play the victim, and keep deflecting away from the inconvenient fact that you seem unable to expand on why you arbitrarily think so

                    • by gweihir ( 88907 )

                      And more posturing. And dyslexia. Pathetic.

                    • by Xest ( 935314 )

                      So you make a comment, you completely fail to back it up, and you call someone else pathetic?

                      You know it's probably easier to just admit you made a comment you didn't think through and that was wrong rather than to continuously try and avoid what's obvious to anyone reading - that you can't back up your point - by playing the victim and throwing random and seemingly arbitrary insults (do you actually know what dyslexia is? it would appear not).

                      I really pity you.

              • disagree. statistics has not traditionally solved the problems that data scientists are working on these days.

    • by Tablizer ( 95088 ) on Friday February 13, 2015 @04:10AM (#49045457) Journal

      I'm not a "troll", I'm an Agitation Engineer.

    • I translate "data scientist" as "PhD in hard sciences who couldn't get a job in his or her field because we've been massively over-training PhDs for the last couple of decades, so he/she took a course in statistics and learned to write simple Python scripts and use scikit-learn and Hadoop." That seems to cover most of the ones I know, anyway. (Although to be fair, some of them knew Python already.)

  • by Anonymous Coward on Thursday February 12, 2015 @10:06PM (#49044171)

    It means you get no women.

  • Score!!! (Score:5, Funny)

    by DoofusOfDeath ( 636671 ) on Thursday February 12, 2015 @10:08PM (#49044183)

    I can't believe Slashdot managed to land an interview with someone from Dice! Time to make some popcorn, sit back, and enjoy the fireworks!

    • I know, right? I wonder how Slashdot even got anyone from Dice to even notice them, much less do a full, informative, in-depth interview about cutting-edge technology!
  • by Anonymous Coward

    Just think - telecoms are accumulating petabytes of data from call setup and cellular handoffs EVERY FEW MONTHS. And this data can be cross referenced with subscriber data and sliced and diced in almost infinitely many different ways.

    If you're the one reciting stats like that with wide open eyes, you're a Data Scientist.

    If you just shrug and say, "Yeah. So?" like everyone else, you're not.

    • by bouldin ( 828821 )

      Just think - telecoms are accumulating petabytes of data from call setup and cellular handoffs EVERY FEW MONTHS. And this data can be cross referenced with subscriber data and sliced and diced in almost infinitely many different ways. If you're the one reciting stats like that with wide open eyes, you're a Data Scientist. If you just shrug and say, "Yeah. So?" like everyone else, you're not.

      I agree, and playing with that kind of data actually sounds fun.

      The big question is, though, what can you do with

      • I knew a guy who claimed to 'frolic in the database'; weirdo.

        I on the other hand occasionally 'wallow in the data' or 'root around in it'.

  • by Anonymous Coward

    It means you get to play with beakers and such. No self respecting scientist doesn't have lot of beakers, test tubes, and strange lab setups with tubes going in all directions.

    • by Anonymous Coward
      God yes. And tesla coils. And one-sneeze-from-exploding alkalines. AND A WIND TUNNEL! And industrial, dangerous, heavy duty [any large machine]s.

      And a lab coat and a clipboard and a particle accelerator and lasers and goggles and oh shit I have a boner.
  • It means... (Score:5, Insightful)

    by thegarbz ( 1787294 ) on Thursday February 12, 2015 @10:51PM (#49044393)

    You cant spell statistician and anyway were too embarrassed to put it on your business card.

    I think we should submit an Ask Slashdot where we ask data scientists precisely how they work in ways that they apply scientific method in their day to day life. Or does having a "scientific mind" now qualify as being a scientist.

    I have a scientific mindset, will I be a pornography scientist later tonight, am I a trolling scientist now?

  • It means you opted for the Blue shirt instead of the Gold. :D

  • What does it mean? (Score:5, Informative)

    by Pete Venkman ( 1659965 ) on Thursday February 12, 2015 @11:02PM (#49044455) Journal
    Absolutely nothing.
  • by michaelmalak ( 91262 ) <michael@michaelmalak.com> on Thursday February 12, 2015 @11:07PM (#49044483) Homepage

    Without sociology skills [datascienceassn.org] (my blog) on a data science team, hypothesis formation and ability to model clients will suffer. It would seem particularly important for a people-focused company like Dice.com.

    • It would seem particularly important for a people-focused company like Dice.com.

      They're not people-focused, they're employer-focused. It costs money to post jobs on Dice, but it's free to look at them. You are the product. Dice has commoditized you. Dice is for employers to buy you, it's not for your use. If it were, then employers would post jobs for free, and you would pay for access (or it would be free.)

      • by Uzuri ( 906298 )

        " If it were, then employers would post jobs for free, and you would pay for access"

        For the love of God, don't give them any ideas!

  • I'm sure there are good reasons to datamine and bad reasons as well. Some goals yield benefits to many while others are more selfish. The question is if there can be more good done or more bad, and if the benefits outweigh the pitfalls. What are we wiling to sacrifice? Are our desires important enough to risk the pitfalls? Do we think we can account for the pitfalls and protect ourselves against them, or are we just being arrogant and blindsiding ourselves?

    Why am I asking you?
  • by Demena ( 966987 ) on Thursday February 12, 2015 @11:36PM (#49044607)

    Errr... You claim to be a scientist and yet you say "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory," .

    Circular definition, circular argument. Also, false. Many scientists (like Darwin for example) form a theory and then look for empirical evidence to test that theory. Next time start that sentence with "In my opinion" and you get away with it. You didn't and you don't.

    Reading your article, it says nothing. I would not hire you on the basis of what you have written here.

    Pardon me if that seems rude but it was in my opinion, too superficial to ignore.

    Oh! By the way, what you do has had a title for a generation. You are an analyst doing what analysts do. Analyse data.

    • There's a huge difference between having a theory and being convinced about a theory.

      • Dinosaurs are narrow at one end, much much thicker in the middle and narrow again at the other end...

  • "Likewise, as a data scientist, I've learned to be suspicious of models that are too accurate, or individual variables that are too predictive."

    I know just how you feel!

    One way around this problem is to round down to the next significance level and reduce it to a yes/no assessment.

    For example, instead of reporting the actual significance, say "p<.05" and instead of citing the correlation as a number, say "we therefore reject the null hypothesis".

    Works a peach, required in most journals, and reduces the workload of the reviewers.

  • by Anonymous Coward

    I guess whatever journalistic ethics Slashdot used to have are out the window. No indication in the OP that Dice owns Slashdot. (I mean, sure most people know that, but when OSDN owned Slashdot at least all relationships were disclosed up front.)

  • What it means "to be a data scientist?" It means that you call yourself a data scientist, and that someone pays you to do things that either you, they, or both of you, agree are "data scientist" types of things. If you're not getting paid, then I think it makes you an "amateur data scientist", "data scientist in training", and "intern data scientist" or my favorite, an "indentured data scientist." There may be other amazing terms to describe this phenomenon (unpaid data scientist) but I believe I am missing

  • by Karmashock ( 2415832 ) on Friday February 13, 2015 @12:35AM (#49044833)

    I'm sure there are some good data scientists but most of the papers I've seen lately that are based on statistics or various data sets are extremely lazy. You have someone that just combs through data and then tries to make a novel association. Nearly always they just show correlation and never causation.

    I think that is one of the bigger problems. Because you're not collecting the data or structuring the experiments that collect the data, you can't isolate anything from the data. All you can do is say "well, this might be happening"... which is often completely useless. A more useful thing they could do is find that correlation and then see if they actually have causation by doing a follow up experiment or study that isolates for a specific variable under controlled conditions.

    That is, I think data scientists would be more useful if they used the study as a jumping off point to doing an actual study. And I'm not especially interested in reading or even hearing about anything they've done until they've concluded that secondary study.

    Absent that... it is lazy, boring, not interesting, and who cares.

    • That is, I think data scientists would be more useful if they used the study as a jumping off point to doing an actual study.

      At which point they'd be "scientists" not "data scientists"

  • how about "blind-input technical author"?
    Considering a good scientist goes in to a sea of data with no expectations (hence bias) about what that data is going to reveal, hence has no incentive to cherrypick. Even anomalies are data. Why are those anomalies there? Are they actually anomalies? Or are they indicators that the original hypothesis or the gathering method itself is flawed?

    Me? I'm in to highly technical writing, but not from a mechanical or electronic or programming field. I analyse human data. Th

  • I rock the house and sign the tits, and that's it!
  • I've always said that data scientist is just a buzzword for statistician. Another statistician called me on that one day, and said "No, a data scientist is a programmer." I'm sorry, but in this day and age, if you are a statistician who can't program, you're not a very good statistician.
    • agree but statisticians have not solved the kind of problems data scientists work on these days. neural networks did not come from statisticians or example. data science involves /some/ statistics and /some/ computer science and that scientific mindset.

  • Data science is to machine learning as "full stack" is to web development. i.e. a horrible buzzword.

The optimum committee has no members. -- Norman Augustine

Working...