Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Stats Education Math

Statistics Losing Ground To CS, Losing Image Among Students 115

theodp (442580) writes Unless some things change, UC Davis Prof. Norman Matloff worries that the Statistician could be added to the endangered species list. "The American Statistical Association (ASA) leadership, and many in Statistics academia," writes Matloff, "have been undergoing a period of angst the last few years, They worry that the field of Statistics is headed for a future of reduced national influence and importance, with the feeling that: [1] The field is to a large extent being usurped by other disciplines, notably Computer Science (CS). [2] Efforts to make the field attractive to students have largely been unsuccessful."

Matloff, who has a foot in both the Statistics and CS camps, but says, "The problem is not that CS people are doing Statistics, but rather that they are doing it poorly. Generally the quality of CS work in Stat is weak. It is not a problem of quality of the researchers themselves; indeed, many of them are very highly talented. Instead, there are a number of systemic reasons for this, structural problems with the CS research 'business model'." So, can Statistics be made more attractive to students? "Here is something that actually can be fixed reasonably simply," suggests no-fan-of-TI-83-pocket-calculators-as-a-computational-vehicle Matloff. "If I had my druthers, I would simply ban AP Stat, and actually, I am one of those people who would do away with the entire AP program. Obviously, there are too many deeply entrenched interests for this to happen, but one thing that can be done for AP Stat is to switch its computational vehicle to R."
This discussion has been archived. No new comments can be posted.

Statistics Losing Ground To CS, Losing Image Among Students

Comments Filter:
  • by Anonymous Coward

    My margin of error is pretty high so things never really seem to turn out how I expect them to turn out.

  • by Anonymous Coward on Wednesday August 27, 2014 @07:54AM (#47764197)

    As a statisticians, you should know better that you don't make your point with a succession of anecdotes as

    - A few years ago, for instance, I attended a talk by a machine learning specialist who had just earned her PhD at one of the very top CS Departments. in the world. She had taken a Bayesian approach to the problem she worked on, and I asked her why she had chosen that specific prior distribution. She couldn’t answer – she had just blindly used what her thesis adviser had given her–and moreover, she was baffled as to why anyone would want to know why that prior was chosen.
    - But there is no substitute for precise thinking, and in my experience, many (nominally) successful CS researchers in Stat do not have a solid understanding of the
    fundamentals underlying the problems they work on. For example, a recent paper in a top CS conference incorrectly stated that the logistic classification model cannot handle non-monotonic relations

    • Re: (Score:3, Funny)

      Considering how small the population size for machine learning researchers in academia can be, it is very likely that anecdotes can constitute a satisfactory sample.
      • by Anonymous Coward on Wednesday August 27, 2014 @08:07AM (#47764295)

        Machine learning is an example in the article. This is a blatant attack on all CS students, researchers and professors.

        Let’s consider the CS issue first. Recently a number of new terms have arisen, such as data science, Big Data, and analytics, and the popularity of the term machine learning has grown rapidly.

        He seems to not really know CS. Statistics and probability are a tool to CS since the very inception. This is no news.

        • by Anonymous Coward

          According to you, he's attacking himself: He is a CS professor (with a PhD in statistics) in a CS department, who thinks that computer science needs to make better use of the tools of statistics and probability.

    • by u38cg ( 607297 ) <calum@callingthetune.co.uk> on Wednesday August 27, 2014 @09:50AM (#47765031) Homepage
      Get real. Anyone doing "statistics" who doesn't understand the concept of a prior is just pretending to do statistics. That is a problem.
  • by NotDrWho ( 3543773 ) on Wednesday August 27, 2014 @07:54AM (#47764199)

    But there's only a 25% chance of that.

    • by Anonymous Coward

      but that applies 100% of the time.

    • by mysidia ( 191772 )

      But there's only a 25% chance of that.

      And that's just the average probability, with the actual probability for a given sample in terms of percentage points having a standard deviation of + / - 25 percentage points.

  • by sinij ( 911942 ) on Wednesday August 27, 2014 @07:58AM (#47764227)
    Statistical analysis is now more complex, and statistics are better understood in science than a decade ago. There are number of software packages and libraries that simplifies and standardizes techniques.

    Correctly applying all of these require subject matter expertise. You need to understand what you analyzing. As a result pure statistician is not very useful - generic analysis can be performed by software, in-depth analysis requires specific knowledge.

    This is not unlike complaining that assembly coding is dying. Well, yes, we now have less need to code everything that way because we have better tools.
    • by Anonymous Coward on Wednesday August 27, 2014 @08:24AM (#47764405)

      Correctly applying all of these require subject matter expertise. You need to understand what you analyzing. As a result pure statistician is not very useful - generic analysis can be performed by software, in-depth analysis requires specific knowledge.

      From my experience, statisticians tend to be far more successful acquiring subject matter expertise than people in other fields have in using proper statistical procedures for their problems.

      It's like saying mathematicians are not useful because calculators. It's simply not true, and while software can perform generic analysis, it is only quite a tiny part of doing a statistical problem correctly. What we have now are coders who think that computers can set up and interpret their problems correctly, and thus we have an increase in bad results.

      • by sinij ( 911942 )
        Yes, there are statisticians that end up working in other fields and they tend to be better at statistics than a typical practitioner in their adapted area of expertise. Thing is, these statisticians are no longer in the field of Statistics, they are researchers in these other fields.
    • by wisnoskij ( 1206448 ) on Wednesday August 27, 2014 @08:48AM (#47764569) Homepage
      I completely disagree. Pretty much everyone is complete shit at statistics. It is a very very advanced and unique field that is continually and horribly bungled by scientists and everyone else. We need statisticians, that said I cannot imagine anyone wanting to go into stats.
      • by sinij ( 911942 ) on Wednesday August 27, 2014 @08:55AM (#47764613)
        Following is anecdote, but when someone I knew approached multiple statisticians with a model question (related repeated measures), the understanding of concept was not there. If your view that "everyone is complete shit at statistics", that should include statisticians.
        • by Anonymous Coward

          Amen! Again anecdotal, but my experience is that junior statisticians, i.e. recent graduates, are for more likely to be able to perform the statistics you need in business, health research, etc than a statistician with postgraduate qualifications. I suppose it's a case of a little knowledge going a long way. The junior guy has more than enough knowledge to deal with 80% of what most people might need to do. It's rare that business or the health sciences need anything more complex than an ANOVA to make their

          • What you say is generally true of professors. Not just statisticians.

            But it's reasonable. The professors assume you already know the simple answer and give the somewhat complete one.

        • by radtea ( 464814 )

          If your view that "everyone is complete shit at statistics", that should include statisticians.

          This has been my experience as well. I would go so far as to say that statisticians understand probability less well than most working experimental scientists. They are overly-enamoured of abstract models and rarely dig down to the raw probability distributions underneath, which is what working scientists actually care about.

      • by JeffSh ( 71237 )

        it seems to me that statisticians are CS overlapping with practical insight.

        When a report is prepared, it takes consideration to define all of the inputs and modifiers that lead to a successful statistical analysis. Without this hard-to-define inputs, i can see how and why a CS-only based approach to stats fails.

      • But I thought Econ was the Dismal Science.

      • by Anonymous Coward

        I want to learn statistics, but it's so ... big. There just aren't enough hours in the day as a working software-industry professional to do something as big and complex as learning statistics. After a day of coding and debugging, my brain is fried. Opening a probability book just doesn't work. I guess you need to learn it while you're still in school, and your job is studying. I took a statistics class in college, but can't remember anything about it.

        Besides, "statistics" is an umbrella term for probabilit

    • by aaaaaaargh! ( 1150173 ) on Wednesday August 27, 2014 @09:48AM (#47765013)

      Quite the opposite is the case. Unless we are talking about experiments with terrabytes of data most software packages are complete overkill anyway, you could make your statistics with a pocket calculator instead. The problem is the conceptual work. Most institutes and individual scientists would be much better off if they employed a well-trained full-time statistician. Provided they were interested in correct and robust results rather than getting one more pilot study published as soon as possible (which will in turn be based on an insignificantly small non-random sample using an inadequate model).

      • by Rich0 ( 548339 )

        I think you hit the nail on the head. Nobody cares about getting it right, they just care about getting it accepted. People know enough statistics to be dangerous.

        But the same is true of almost any field. Unless you work in some kind of skunk works team, how many of your coworkers REALLY have a good grasp of the fundamentals in whatever profession you work in? Do you think the average CS major has any idea what an opcode is, or how to implement a binary adder (just in terms of theoretical gates, let alo

    • As a result [a] pure statistician is not very useful - generic analysis can be performed by software, in-depth analysis requires specific knowledge.

      In-depth analysis requires a real understanding of statistics as well as of the domain. CS knowledge, at least as commonly taught, is not a substitute for for the statistics requirement.

      This is not unlike complaining that assembly coding is dying. Well, yes, we now have less need to code everything that way because we have better tools.

      This is not a valid analogy. HLLs automated some of the rote, mechanical aspects of implementing algorithms. They do not automate away the need for a higher-level understanding of what you are doing.

  • by BorisSkratchunkov ( 642046 ) on Wednesday August 27, 2014 @08:02AM (#47764273) Journal
    Most notably psychology, economics, mathematics and beer brewing. In fact, most of the developments in stats have come about as a result of a need arising in a different discipline. Stats is inherently an applied discipline, so this is not unusual.

    What is concerning is how many statistical tools, each with their own set of assumptions, have blossomed up within the past few decades. There are so many stats now that stats can no longer be an ancillary to other disciplines- it needs to be given its own space and statisticians need to be given respect for their unique expertise. There is simply too much knowledge in that domain for those in more theory-driven fields to be able to claim both expertise in the conceptual models of their fields and statistics.
  • Comment removed (Score:5, Interesting)

    by account_deleted ( 4530225 ) on Wednesday August 27, 2014 @08:10AM (#47764319)
    Comment removed based on user account deletion
    • I would add that many disciplines are recognizing the importance of statistics and are therefore introducing applied statistics courses for [discipline X]. This causes a drop in enrollment in the pure statistics courses, thus decreasing the number of pure statistics instructors, thus decreasing the demand for individuals trained in pure statistics. In this way statistics is losing itself as a discipline and is quickly becoming specialized into various disciplines (e.g., the application of statistics for med
    • by Anonymous Coward on Wednesday August 27, 2014 @10:49AM (#47765681)

      It's a funny coincidence this appeared on Slashdot, as I was just reading about this issue and discussing it with my colleagues.

      I'm a statistics researcher in an applied field (university academic research) that suffers its own image problem, and my impression is that what we're witnessing in many STEM areas are problems with stereotyping in science, and marketing fads. I'm not sure that I disagree with what you're saying, but I think that there's another stereotype operating as well that cuts at the field of statistics in a second direction.

      As you point out, there are the sort of applied consulting statisticians who are probably getting increased competition from "data scientists."

      On the other side of the issue, though, you have complaints about theory-focused statisticians who really don't understand how to implement their developments computationally, who are also getting increased competition from "data scientists." This has been mentioned in a number of blog posts in various places, and I see as much more as the driver of "data science" as a banner than competition with consulting statisticians. E.g., CS individuals who feel they can do Hadoop and so forth, and who have had enough stats training, probably in undergrad, that they feel like they can just sort of usurp the statistics from the statisticians. They see the theory as irrelevant or something.

      The problem as I see it is that individuals who identify as "data scientists" don't really understand that the theory has to come from somewhere, and they fail to appreciate the issues that come up when dealing with uncertainty. It's like everyone in the field has some undergraduate-engineering-student level understanding of statistics, and don't have to deal with thorny data collection designs, complex inferences, or replicability of findings. The sort of scenario that's motivated "data science" is essentially this: a extremely large dataset involving relatively simple classification or prediction questions about observational data where there's really no scrutiny about generalizability or the meaning of the results. This problem scenario is why they got involved instead of a statistician in the first place: because the bottleneck was the size of the dataset, not the analysis scenario.

      All of the attempts to distinguish "data science" from statistics it seems to me are based on stereotypes or misunderstandings about statistics, as you point out, or on extremely short-sighted perspectives on science and math. Computational statistics has been a core part of statistics for decades (there are journals devoted to the topic), and you can find peer-reviewed articles on all sorts of computational problems in statistics (e.g., the use of GPUs in estimation problems, how to approach optimization with distributed processors, etc.). The idea that statistics is all theory, and that statisticians don't understand computational issues is naive or has a very stereotyped view of statistics (or I pity their experiences in high school and college--it sounds like they got a poor education in statistics).

      This isn't to pooh-pooh the contributions of CS--it's critical. But I hate the banner of "data science"--not only is the term stupid and redundant (how can you have science without data? What other kind of science is there?), it's based on ignorant stereotypes about statistics as a field.

      To me, this speaks to a longer term problem in CS, which is CS essentially discovering what's been going on in other fields and reinventing the wheel over and over again. I don't see this necessarily in CS academic departments, but I do see it where there's some interface with the business world. It's coming up now with statistics, it's come up before with social sciences and economics, it's come up with AI and neuroscience, it's come up with genomics, it comes up over and over again. It speaks to a sort of arrogance or autism in the field's culture, where they act as if their unawareness of a phenomenon means that no one has ever researched it before.

      Ughh... think about statistics as the mathematics of uncertainty, and see how far you get with deemphasizing that. Damn, I hate society sometimes. I need a walk.

      • by Anonymous Coward

        Would mod parent up if I could.

        I think part of the issue is that people in software are often expected to understand and implement (code) difficult concepts in fields well outside of their domain of knowledge, simply because they are experts at coding. For example, an academic in bioinformatics explained to me that he spent months correcting code in bioinformatics software that made the most basic of errors in genetics, such as reading a strand of DNA backward. I've seen similar issues in government, wher

      • they got involved instead of a statistician in the first place: because the bottleneck was the size of the dataset, not the analysis scenario.

        it's not just the data size, it's also b/c the data is unstructured and of various kinds..nothing nice and tabular. also this data is being generated by complex processes...where a fresh algorithmic and/or domain expertise can help.

        this doesn't mean that stats aren't needed.

  • by Anonymous Coward

    What the fuck does "AP" mean?
    I'm dabbling on the "AP Central" website and other but they all talk about AP courses, how to get a course labelled AP, "AP is your time well spent" but never a definition of what AP is. It's ridiculous to use such a two-letter initialism and hide its meaning like it's a secret thing for "consumers" who buy higher end education in the US.

    • It stands for "Advanced Placement." They're college-level high school courses. At the end of the year, you take the advanced placement exam, and depending on your scores and the college you attend, you can get college credits for them.

      I think getting rid of an AP is a stupendously short-sighted idea. Having students take more advanced courses earlier is a great idea. If there's reason to believe the courses aren't actually as demanding as their college equivalent (and I don't think there is, based on my

      • Re: (Score:2, Interesting)

        by Anonymous Coward

        Getting rid of it is just an attempt to waste students' time and extract more money from them by forcing them to take more university courses.

        I suspect his complaint is that in high school, AP Statistics is taught by math teachers. In college, classes are taught by professors who specialize in statistics. This goes along with his general complaint that people in other disciplines don't take the time to really understand how statistics work. Of course, the same problem exists in college statistics courses. You can take a one semester survey course or the two semester theory course. He'd prefer that everyone took the two semester course and th

        • He may be right about AP Statistics though. Taking statistics in high school means that most people will have forgotten it by the time they get to advanced courses that use statistical methods.

          Unless you're an actual statistics major (in which case you'll pick up whatever you missed in subsequent courses anyway), that's going to be true regardless of whether you take statistics in high school or college. I took AP statistics, but my university required me to take "Statistics for Engineers" as an EE major, and wouldn't allow the AP stat course to count towards that. Stats for Engineers was an absolute joke, and the high school class was for more rigorous.

      • by RR ( 64484 )

        I think getting rid of an AP is a stupendously short-sighted idea. Having students take more advanced courses earlier is a great idea.

        The problem is that AP classes are, pretty uniformly, badly constructed. Half of the education in AP math and science courses is How to Use the TI-83 Calculator. Half of AP Computer Science is How to Program in Java. The College Board is single-handedly blocking progress in the education of technology in math and science.

        I don't know about the rest of the AP classes. I also think the College Board's role in college admissions, via SAT and AP, is fragile and counterproductive.

        • The problem is that AP classes are, pretty uniformly, badly constructed. Half of the education in AP math and science courses is How to Use the TI-83 Calculator. Half of AP Computer Science is How to Program in Java. The College Board is single-handedly blocking progress in the education of technology in math and science.

          Yeah, but they replace the low-level introduction courses in college, not the more advanced ones. 100-level computer science courses in college ARE, "how to program in Java." And, like I said, my Calculus course in High School seemed better than the equivalent in college from what I was seeing.

          If anything, those high school courses mean you don't have to take the BS introductory courses in college, and you can go straight to the more interesting / demanding ones during your freshman year.

  • I think the problem is that statisticians have small, unconnected habitats and overly complex mating rituals.
  • Efforts to make the field attractive to students have largely been unsuccessful."

    You would think they would know which efforts work and which don't. I'm only being a bit sarcastic with the Subject line, but seriously they should be able to figure out what does and doesn't work.

  • by Anonymous Coward

    I don't dare publish this unless I am anonymous, but I must state this observation:

    We are always on the lookout for new statisticians in our medical group. About 95% of our applicants are Chinese females! I had asked one of our (Chinese) scientists about this, and he said that this is because of the proliferation of MS in statistics programs that are amenable to spouses who were interested in a profession that could be attained (and makes good bacon) while their husbands were working on advanced degrees in

    • The opposite side of the same coin is that no one wants to be the lone white man in a room full of Chinese women.

      What is the pretest probability of this being true?

  • Statistics is a dirty word today, even though modern science depends upon it. The public most commonly encounters them when they are lied about.

    • by Xtifr ( 1323 )

      Indeed, which is why, when I talk to kids about math in school, one of the things I like to point out is that while statistics are, in general, rather boring, it's really important to learn enough to have at least a chance of recognizing when they're being used to lie to you. This argument gets through to a suprising number of them.

  • by dywolf ( 2673597 ) on Wednesday August 27, 2014 @09:01AM (#47764647)

    As far as the general public is concerned:
    When it's convenient, people use numbers, real or made up, in order to disprove the other sides point and prove their own...
    When it's not convenient, all statistics become questionable ("ya, but msot statistics are made up") in order to disprove the other sides point and prove their own...

    The reality of the numbers don't matter. People just don't care about actual objective facts, they just want to back up their preconcieved notions to spread their stupidity. It's just like how Americans approach science in general really.

  • by DoofusOfDeath ( 636671 ) on Wednesday August 27, 2014 @09:05AM (#47764673)

    I'm not very trained in statistics, but I've read more than my fair share of academic computer science papers over the years.

    Even with my limited training in statistics, I've known enough to be appalled by the errant statistical reasoning used. Or even not used. I.e., "We don't know how many times to run a program to get a 'valid' average running time, so we ran it three times. Here's the average: ..." The authors seemingly aren't just ignorant of how to get the answer; they often seem to have not thought through what questions they're trying to answer in the first place with their measurements and resulting statistics.

    I think a few problems come into play here:

    • The mathematics of statistics can be hard.
    • Thinking through the meanings of statistics requires careful thought, especially for experimental design and/or system performance characterization. Many CS practitioners would prefer to not invest mental energy in this aspect of their work because they don't enjoy it; it's a distraction to what they want to do.
    • Because so many people in CS are bad at statistics, peer reviewers tend to let it slide. This helps foster a culture problem. If I'm under the wire to get a paper published and I'm near deadline, do I take an extra 20 hours to get the statistics right? Especially knowing that I'm judged by the number of published papers, and that the peer reviewers won't notice or care about poor statistical reasoning?
    • It's easy to make statistical reasoning errors without noticing it. Especially if you're not surrounded by statisticians.

    Despite CS majors thinking we're so smart about mathematical issues, I think this might be one area where that confidence is delusional. I suspect most psychology majors who paid attention in their Experimental Design courses are more capable in the appropriate mathematics than are most CS majors.

  • There's an app for that!
  • Statistics as taught in it's current form (just like economics) came about in the 50's, they were both designed by out of work mathematicians. Now to an unemployed Math PHD in the 50's with the start of the space program and the massive military research programs, how good were these guys? When I took Stat in college I was amazed. Take one set of data and produce two diametrically opposed answers and have them both correct? Sounds like rumor, gossip, and BS to me, not science.
    No wonder there are lies, damn
  • "There are three types of lies: lies, damn lies, and statistics." -attribution disputed

    http://en.wikipedia.org/wiki/L... [wikipedia.org]
  • Is this very poorly written article about: 1) students not choosing to pursue a career path in computer science rather than statistics... or... 2) CS people doing poor-quality statistics work... or... 3)banning the Advanced Placement "Statistics" class because students are relying too much on their "pocket calculators." We get three-articles-in-one to talk about here. At least they are all loosely related to something called "statistics."
  • Statistics done right is hard and boring. People prefer hacking to do hard and boring stuff.
  • I work with a couple of very good statisticians. What they do is a mystery to me, but one thing I can say for sure - a good programmer or DBA will find work much more easily than a good statistician. In large part because PHBs have no clue why they need someone with more than two semesters of probability in almost every application.

    Another problem with students going into statistics in the US is that virtually all of the instructors don't speak very good English. To this day I want to say things like "proba

    • "don't speak very good English."

      Did you know that, statistically speaking, 100% of people who "don't speak very good English." don't speak English very well?

  • Statistics as a subset of CS isn't unreasonable given that nearly all statistics will be calculated by software.

    • That's complete horseshit (along with this article). It's like saying math is a subset of CS because nearly all maths will be calculated by CS.

      Stats is orthogonal to CS. You don't need one to do the other.

      Having both though can give you a skill set that's quite useful.

      • That's fine. Just throwing out an idea. Perhaps keep statistics as it is... but add a CS statistics course so that the CS students get a better background in it.

        Not trying to upset apple carts here. Just trying to find reasonable solutions. :)

        • Educating more STEM people in statistics would help, regardless of fields. In very many fields, results are statistical in nature, and I really don't trust a lot of researchers to do the statistics right.

          A friend worked for the FDA as an animal nutritionist. She thought her criteria were clear: she wanted to see three experiments, with control groups, significant at the 5% level. People didn't understand her.

  • It's a shame it has such a reputation for being boring, and it is a shame that it seems to be rarely taught in an engaging way.

    Statistics is the first artificial intelligence. It formalises what we know when we 'know'. It is fundamental.

    It's also fairly hard to do right. But many worthwhile things are hard.

  • I know a lot of people who get CS gigs after school. It pays their bills. They do well.

    The stats people I know are really, really rich. And there are a lot of them.

    That's in Raleigh.
  • If you like the field of statistics it seems a better long-term bet than IT. The "laws" of math are not going to change in 40 years, where-as in IT the languages, GUI's, frameworks, and Paradigm Fad of the Day will change...several times. Plus it won't give you Carpel Tunnel (unless you can't trick a grunt into data entry). You are expected to know the domain (industry) such that outsourcing is not as likely either.

    Software may pay more in the short term, but career-wise, stats seems more stable.

  • How to lie with data structures

  • I read on Slashdor a lot time ago that there is a Comic book in Japan that teaches Statistics. I really would like to buy a copy in Spanish or at least in English.
    Any link on where to buy it it is welcome

The opossum is a very sophisticated animal. It doesn't even get up until 5 or 6 PM.

Working...