Forgot your password?
typodupeerror
Mozilla Programming Science

Mozilla Plan Seeks To Debug Scientific Code 115

Posted by Soulskill
from the unit-tests-are-for-undergrads dept.
ananyo writes "An offshoot of Mozilla is aiming to discover whether a review process could improve the quality of researcher-built software that is used in myriad fields today, ranging from ecology and biology to social science. In an experiment being run by the Mozilla Science Lab, software engineers have reviewed selected pieces of code from published papers in computational biology. The reviewers looked at snippets of code up to 200 lines long that were included in the papers and written in widely used programming languages, such as R, Python and Perl. The Mozilla engineers have discussed their findings with the papers’ authors, who can now choose what, if anything, to do with the markups — including whether to permit disclosure of the results. But some researchers say that having software reviewers looking over their shoulder might backfire. 'One worry I have is that, with reviews like this, scientists will be even more discouraged from publishing their code,' says biostatistician Roger Peng at the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland. 'We need to get more code out there, not improve how it looks.'"
This discussion has been archived. No new comments can be posted.

Mozilla Plan Seeks To Debug Scientific Code

Comments Filter:
  • Wrong objective. (Score:5, Insightful)

    by smart_ass (322852) on Wednesday September 25, 2013 @12:24AM (#44944655)

    I don't know the actual objective ... but if the concern is "'We need to get more code out there, not improve how it looks.'" ... the objective is bad.

    Wouldn't shouldn't this be about catching subtle logic / calculation flaws that lead to incorrect conclusions?

    Agree ... if this is about indenting and which method of commenting ... then yeah ... bad idea.

    But this has the possibility of being so much more. I would see it as free editing by qualified people. Seems like a deal.

    • Exactly. If the code they are writing looks like bad PHP from 10 years ago then it needs to be exposed.

      What is needed is more *good quality* code being published.

      • by mwvdlee (775178) on Wednesday September 25, 2013 @03:18AM (#44945291) Homepage

        I think that's exactly the opposite of the point the GP was trying to make.

        If it looks like bad PHP from 10 years ago but contains no bugs, then that is completely okay.
        If it looks like old COBOL strung together with GO TO's and it works, it's okay.
        If it looks like perfect C++ code but contains bugs, the bugs needs to be exposed, especially so if the research results are based on the output of the code.

        • Re: (Score:3, Informative)

          by Macchendra (2919537)
          It is easier to find bugs in code where all of the objects, variables, methods, etc. are named according to their actual purpose. It is easier for other researchers to integrate their own ideas if the code is self documenting. It is easier to integrate with other software if the interfaces are cleanly defined. It is easier to verify the results of intermediate steps if there is proper encapsulation. Also, proper encapsulation reduces the chances of unintended side-effects when data is modified outside o
          • by mwvdlee (775178)

            All of which are great if code is to be maintained, which this type of code rarely is.
            None of which affects whether the code actually works.

            • Making bugs visible does affect whether the code actually works. So does making the components testable.
            • by MiniMike (234881)

              All of which are great if code is to be maintained, which this type of code rarely is.

              Not always true, probably not by a long shot. I'm maintaining code written over a span of time beginning in the 1980's (not by me) and last updated yesterday (and again as soon as I'm done here...). Some written very well, some quite the opposite. Not often is scientific code used for just one project, if it's of any significant utility.

            • by swillden (191260)

              All of which are great if code is to be maintained, which this type of code rarely is.

              Or if it is re-used, which is one of the potential benefits of publishing it alongside the paper.

              Also, since the purpose of research papers is to transmit ideas, clear, readable code serves readers much better than functional but opaque code... and that assumes the code is actually functional. Ugly code tends to be buggier, precisely because it's harder to understand.

        • by ebno-10db (1459097) on Wednesday September 25, 2013 @09:46AM (#44947469)

          If it looks like bad PHP from 10 years ago but contains no bugs, then that is completely okay.
          If it looks like old COBOL strung together with GO TO's and it works, it's okay.
          If it looks like perfect C++ code but contains bugs, the bugs needs to be exposed, especially so if the research results are based on the output of the code.

          None of the above. It's scientific code. It looks like bad Fortran (or even worse, FORTRAN) from 20 years ago, which is ok, since Fortran 90 is fine for number crunching.

          In all seriousness, my experience is that "Ph.D. types" (for want of a better term) write some of the most amateurish code I've ever seen. I've worked with people whose knowledge and ability I can only envy, and who are anything but ivory tower types, but write code like it was BASIC from a kindergartener (ok, today's kindergarteners probably write better code than in my day). Silly things like magic numbers instead of properly defined constants (and used in multiple places no less!), cut-and-paste instead of creating functions, hideous control structures for even simple things. Ironically, this is despite the fact that number crunching code generally has a simple code structure and simple data structures. I think bad code is part of the culture or something. The downside is that it makes it more likely to have bugs, and very difficult to modify.

          Realistically, this is because they're judged on their results and not their code. To many people here, the code is the end product, but to others it's a means to an end. Better scrutiny of it though would lead to more reliable results. It should be mandatory to release the entire program within, say, 1 year of publication. As for it being obfuscated, intentionally or otherwise, I don't think there's much you can do about that.

    • Re:Wrong objective. (Score:4, Informative)

      by Anonymous Coward on Wednesday September 25, 2013 @12:54AM (#44944775)

      The problem is most papers do not publish the code, only the results. This causes dozens of problems: if you want to run their code on a different instance you can't, if you want to run it on different hardware you can't, if you want to compare it with yours you only sort of can since you have to either reimplement their code or run yours on a different environment than theirs, which makes comparisons difficult. Oh, and it makes verifying the results even more worse, but it isn't like many people try to verify anything.

      On the one hand catching bugs can help find a conclusion was wrong sooner than it would happen otherwise. On the other hand it may make it less likely that authors will put their code out there. Anyhow, I think it's a good idea and worth a shot. Who knows, maybe it'll end up helping a lot.

      • by icebike (68054) on Wednesday September 25, 2013 @02:04AM (#44945023)

        Well running The ORIGINAL author's code isn't that important.

        What's important is the analysis that the code was supposed to do.

        Describing that in mathematical terms and letting anyone trying to replicate the research is better than handing the original code forward. That's just passing another potential source of error forward.

        Most of the (few) research projects I been called to help with coding on are strictly package runners. Only a one had anything approaching custom software, and it was a mess.

        • by ralphbecket (225429) on Wednesday September 25, 2013 @02:34AM (#44945131)

          I have to disagree. Before I go to a heap of effort reproducing your experiment, I want to check that the analysis you ran was the one you described in your paper. After I've convinced myself that you haven't made a mistake here, I may then go and try your experiment on new data, hopefully thereby confirming or invalidating your claims. Indeed, by giving me access to your code you can't then claim that I have misunderstood you if I do obtain an invalidating result.

          • Re: Wrong objective. (Score:5, Interesting)

            by old man moss (863461) on Wednesday September 25, 2013 @06:30AM (#44946061) Homepage
            Yes, totally agree. As someone who has tried to reproduce other people's results (in the field of image processing) with mixed success. It can be incredibly time consuming trying to compare techniques which appear to be described accurately in journals, but omit "minor" details of implementation which actually turn out to be critical. I have also had results of my own which seemed odd and were ultimately due to coding errors which inadvertently improved the result. Given the opportunity, I would have published all my academic code.
            • by Shavano (2541114)

              You had the opportunity. You could have put your code and notes on how to use it and the appendix to your papers.

          • I agree. Code is math, and thus of the experiment and analysis, and is not just an interpretation. "Duplicate it yourself" stands against the very idea of review and reproduction.

            While there is tremendous utility in an independent reconstruction of an algorithm (I have numerous times built a separate chunk of code to calculate something in a completely different way, to test against the real algorithm/code, in practice they debug each other) the actual code needs to be there for review.

            They may have a des

    • by dcollins (135727) on Wednesday September 25, 2013 @01:44AM (#44944947) Homepage

      Yeah, it seems like the real objective should be to get more code read and verified as part of the scientific process. (Just "getting more code out there" and expecting it to go unread would be pretty empty.)

      One problem is that the publish-or-perish process has gotten sufficiently corrupt that many results are irreproducible, PhD students are warned against trying to reproduce results, and everyone involved has lost the expectation that their work will be experimentally double-checked.

      • Re:Wrong objective. (Score:4, Interesting)

        by Anonymous Coward on Wednesday September 25, 2013 @04:42AM (#44945553)

        As a PhD student I am actively encouraged to reproduce results, mostly this has been possible but I know of at least one paper which has been withdrawn because my supervisor queried their results after we failed to reproduce them (I'll be charitable and say it was an honest mistake on their part).

        I guess whether you are encouraged to check others work depends on your university and subject, but in certain areas it Does happen.

      • I remember when I was in graduate school looking over a member of my group's shoulder and realizing he thought that the ^ operator in C meant raise to the power of instead of being the bitwise XOR operator. Scientists are often pretty indifferent programmers.
        • by biodata (1981610)
          this^1000
        • In all fairness that's an easy mistake to make, because ^ means exponentiation in other languages. It's an historical stupidity, like the fact that log() is the natural log, not log10().

        • by Shavano (2541114)

          Frequently. It's not supposed to be their main area of expertise and they often learn just enough to solve their immediate problem. And why should they learn more? So occasionally they make blunders like that, but a professional computer programmer wouldn't know what problem to code or what analysis needs to be done in the first place. That's what the scientists are good at.

        • by tlhIngan (30335)

          I remember when I was in graduate school looking over a member of my group's shoulder and realizing he thought that the ^ operator in C meant raise to the power of instead of being the bitwise XOR operator. Scientists are often pretty indifferent programmers.

          Scientists and researchers generally write lousy code. If you think TheDailyWTF is bad, you haven't seen researcher code.

          Generally write-only, lots of copy-pasta going on, variables that *might* make sense (and probably declared globally) and if you're

      • by Shavano (2541114)

        Ph.D. dissertations require original research. However, assigned classwork for Doctor's and Master's students would be improved if it involved replication and re-analysis of recent research in the field to study methods of data collection and analysis. This would make replication and reexamination of recent research a routine part of academia. The benefits for the students would be seeing how other researchers do their work and practice at methods of analysis and occasionally the satisfaction of showing

    • Not to mention that the idea of not publishing code is at stark odds with the goal of scientific publication, which is reproducibility: as things depend more and more on the processing SW, papers and datasets aren't enough, you need the code was used to generate the results, otherwise it's irreproducible.
    • by Shavano (2541114)

      I don't know the actual objective ... but if the concern is "'We need to get more code out there, not improve how it looks.'" ... the objective is bad.

      Wouldn't shouldn't this be about catching subtle logic / calculation flaws that lead to incorrect conclusions?

      Agree ... if this is about indenting and which method of commenting ... then yeah ... bad idea.

      But this has the possibility of being so much more. I would see it as free editing by qualified people. Seems like a deal.

      That's one of two worthy objectives. The other is to make the code more suitable for use by other researchers.

  • Hell Yes! (Score:5, Insightful)

    by Garridan (597129) on Wednesday September 25, 2013 @12:28AM (#44944673)
    Where do I sign up? If I could get a "code reviewed by third party" stamp on my papers, I'd feel a lot better about publishing the code and the results derived from it. Maybe mathematicians are weird like that -- I face stigma for using a computer, so anything I can do to make it look more trustworthy is awesome.
    • Re:Hell Yes! (Score:5, Insightful)

      by JanneM (7445) on Wednesday September 25, 2013 @12:47AM (#44944733) Homepage

      Problem is, at least in this trial they're reviewing already published code, when it's too late to gain much benefit from the review on the part of the original writer. A research project is normally time-limited after all; by the time the paper and data is public, the project is often done and people have moved on.

      There's nobody with the time or inclination to, for instance, create and release a new improved version of the code at that point. And unless there's errors which lead to truly significant changes in the analysis, nobody would be willing to publish any kind of amended analysis either.

      • by Anonymous Coward

        There is a reason that models have to be validated. If you choose validation cases well, a code that passes them will almost certainly be a good model. Beyond that, you do the best you really can, and that's that.

        Otherwise, here, I've got 40k lines of code here, anyone want to check it over for me? This is free of charge, right?

    • by PsyberS (1356021)

      Where do I sign up? If I could get a "code reviewed by third party" stamp on my papers, I'd feel a lot better about publishing the code and the results derived from it.

      Believe it or not, some computer science programming language conferences are doing *just that*.

      http://cs.brown.edu/~sk/Memos/Conference-Artifact-Evaluation/ [brown.edu]
      http://ecoop13-aec.cs.brown.edu/ [brown.edu]
      http://splashcon.org/2013/cfp/665 [splashcon.org]

  • When did Mozilla get a Science Lab? Here I always thought that all the Mozilla foundation made a decent browser, and now I find they have a science lab. What other things does Mozilla do?
    • by Anonymous Coward

      A tiddlywinks ballroom, two vending machines and a build-a-squirrel online project. Apparently they have made some attempt at an internet browser too.

    • What else do they do, you ask? They support Seamonkey, Firefox's older brother. Firefox began as a stripped down,lightweight, minimalist version of Seamonkey. Though Firefox is no longer lightweight, Seamonkey is still more capable in some respects. The suite includes an email client and WYSIWYG editor, but I just like the browser.

      While Firefox is controlled by the Mozilla Foundation, Seamonkey is community driven now, with hosting and other support from the foundation.

    • by sg_oneill (159032)

      Mozilla is a bit like Apache, its a broad tent of vaguelly related projects , its not just firefox.

    • by jopsen (885607)
      Mozlla also does webmaker, education and let's not forget Firefox OS...
  • by Anonymous Coward

    The overall structure of most the code in HEP [1] is nasty. It's too late for the likes of ROOT [2]: input of software engineers at the early stages of code design could be very useful.

    1. https://en.wikipedia.org/wiki/Particle_physics
    2. https://en.wikipedia.org/wiki/Root.cern

  • Mozilla barely has control of their own code base. The number of open bugs keeps increasing. Attempts to multi-thread the browser failed. The frantic release schedule results in things like the broken Firefox 23, where panels in add-ons just disappeared off screen. They have legacy code back to Netscape 1, and it's crushing them. Firefox market share is declining steadily. Not good.

  • See subject line. I don't know what the hell qualifies Mozilla to review scientific code. For one thing, scientific code in academic papers is proof-of-concept - it's designed to show how to implement something according to the description in the paper, not engineered for general deployment.

    The bla bla need more people counterargument is bollocks, however - there are enough people in computational biology doing utterly pointless things.

    Perhaps Mozilla's looking for another way to justify its on-going tax av

  • by MrEricSir (398214) on Wednesday September 25, 2013 @12:40AM (#44944713) Homepage

    As we've seen recently, bad decisions can be made from errors in spreadsheets. We need these published so they can be double-checked as well.

    • by Anonymous Coward

      They should be publishing their code because the basic precept behind peer reviewed publishing is that results could be reproduced. Most of the time they are not but computational scientists need to be constantly reminded that they are performing experiments, not publishing the code is exactly the same as a synthetic chemist not including an experimental section (the procedure for the synthesis).

    • bad decisions can be made from errors in spreadsheets.

      Oh, If only you knew...

      We need these published so they can be double-checked as well.

      Well, I wouldn't go so far as publishing my findings, but now I always double-check spread sheets when I'm not sure if it is or isn't a ladyboy.

    • As we've seen recently, bad decisions can be made from errors in spreadsheets.

      For that problem, let's just get rid of spreadsheets (at least as they're implemented in most programs). Copy-and-paste is the standard way to do the same computation in several places. How much further could you get from good practice? Reviewing the "code" requires peering at every cell. Etc., etc,. etc. Lastly, the people who use them are often idiots who have no idea what they're doing. At least if you made them use a programming language, they'd never get it to run. That way they couldn't pretend that t

  • Mozilla better work on de-bloating its own code first.

  • If you want to code, then you got to get used to code reviews. It is the only way to improve quality and a scientist that doesn't want to improve quality should not be a scientist.
    • Correction: a scientist that doesn't want to improve source quality shouldn't be a codemonkey...
      • by John Allsup (987)
        Nor should such a scientist rely on the results of computer code in his research.  What you rely on in proper research, you should be an expert in.  Scientists who use code should be codemonkeys, but not all scientists should use code -- pen, paper and a well drilled mind are far more powerful, properly mastered and harnessed.
      • by BitZtream (692029)

        Correction: a scientist that doesn't want to improve source quality isn't a scientist.

        Some can argue that they don't have time or budget to do so, but flat out not wanting to is a failure of the process itself. Its not someone you want to trust to make predictions on data.

    • You obviously haven't worked with people who are world leaders in their field they are not going to take advice from some commercial web dev on code.

      Though back in the day I did make one guys code a bit more user friendly (his origioal comment was I dont need any prompts to remind me what i need to type ) as we had scaled to 1:1 models and as one single run of the rig could cost £20k in materiel's.
  • The Horror (Score:3, Interesting)

    by Vegemite (609048) on Wednesday September 25, 2013 @01:26AM (#44944911) Homepage
    You must be joking. Many scientific papers out there have results based on prototype or proof of concept software written by naive grad students for their advisors. These are largely uncommented hacks with little, if any, sanity checks. To sell these prototypes commercially, I have had to cleanup after some of these grads. I take great sadistic pleasure in throwing out two years of effort and rewriting it all from scratch in a couple of weeks.
    • I have had to cleanup after some of these grads. I take great sadistic pleasure in throwing out two years of effort and rewriting it all from scratch in a couple of weeks.

      Of course it's a lot easier and quicker to re-write someone's code when you already know what you're aiming at.

  • by dargaud (518470) <slashdot2@gd a r g a ud.net> on Wednesday September 25, 2013 @01:36AM (#44944931) Homepage
    ...it wouldn't be called research now does it ? Seriously manu scientific projects start with a vague idea and no funds. You do a table experiment, connect it to a 15 year old computer, then grow from there. In some projects I got no more than a quarter page of specifications for what ended up as 30 thousand lines of code. Yes I write scientific code, and no it's not always pretty and refactored and all that. Also there's never any money.
  • I've been a fateful mozilla user for years. However on MAC due to the slowness of the browser and the high RAM consumption I permanently switched to Chrome. So may be they should make an experiment on how to keep their MAC users because until now they've been great at that. When I went to buy VPN from http://vpnarea.com [vpnarea.com] I was surprised to find out that they had an extension for Chrome but not for Mozilla.
    • MAC [wikipedia.org] (all-caps) - Machine Access Code, a hexadecmial address used to identify individual pieces hardware on a network
      Mac [wikipedia.org] - marketing name for the longstanding "Macintosh" line of computers by Apple

      I've used Firefox since it first came out, but it's so damned bloated with unneeded 'extras' that I only stick with it because it's the one browser that allows extensions like AdBlock Plus to block outgoing server requests, not just hide the results. I had defected over to Opera for several months, but when they d

      • by _merlin (160982)

        I've used Firefox since it first came out, but it's so damned bloated with unneeded 'extras' that I only stick with it because it's the one browser that allows extensions like AdBlock Plus to block outgoing server requests, not just hide the results.

        FWIW Safari allows extensions to block the requests before they're made as well, although the exact mechanism may be different.

    • by smash (1351)
      Good enough Safari had me ditch both Mozilla AND Chrome. I've had no real issue with Safari since 4.0... certainly nothing big enought to justify installing another browser to secure and maintain.
  • by Anonymous Coward

    Most of my collegues at the university are terrible coders and I am often even not sure how much I trust their results. Even if it does scare people, there has to be more awareness about code review in the scientific field than there is today.

  • Having seen some code written by an esteemed Bio-Chemist, I agree that experienced programmers should be reviewing their code, but then, you'd expect a true scientist to have an expert review his stuff anyway.

    My experience was a real eye opener. Between the buffer overruns, and logic holes, I am amazed the crap ran at all. The fact that it compiled was a bit of a mystery until I realized that it was possible to ignore compile errors.
    • Re: (Score:2, Insightful)

      by Anonymous Coward

      This is a logical fallacy that many 'smart' people fall into. I am smart (in this case usually PhD's or people on their way to it) so this XYZ thing should be no sweat. They seem to forget that they spent 10-15 years becoming very good at whatever they do. Becoming a master of it. Yet somehow they also believe they can use this mastery on other things. In some very narrow cases you can do this. But many times you can not. Or even worse assuming no one else can understand what you are doing or they wi

  • Egoless programming (Score:2, Interesting)

    by Anonymous Coward

    Back in the late 70s middle ages of comp sci...
    There was this thing called "egoless programming" being taught. The idea being that we have to inculcate in developers the idea that your code is not necessarily a reflection of your personal worth, and that it deserves to be poked at and prodded, and that you should not take personal offense by it.

    Yeah, it's a child of the 60s kind of thing, but it does work.

    This is a huge challenge in the biomedical research field, because to be successful, you need personal

    • by John Allsup (987)
      That modern research rewards egoism is one of the most dangerous, worrying and disillusioning features of modern research.  The best thinkers are sure to be suffocated in the face of masses of intellectual university graduates chasing research money and the dream of being regarded as one of those 'best thinkers'.
    • The idea being that we have to inculcate in developers the idea that your code is not necessarily a reflection of your personal worth, and that it deserves to be poked at and prodded, and that you should not take personal offense by it.

      Wusses and namby-pambies. I take the opposite approach. Three or more bugs found in your code results in summary execution, with your corpse hung from the flagpole as a reminder to others.

  • and to improve how it looks, and lose the shame that we instinctively feel in the face of criticism.  No-one codes perfectly, so there is always room for useful criticism and progress, and we need to get that awareness of coding issues out as well, not just code alone.
  • Faith is where it's at! Looking at "science" journals is like looking at internet pron- it's a one way ticket to H-E-double hockeysticks! You need some proper churchin'!

  • by Anonymous Coward on Wednesday September 25, 2013 @07:09AM (#44946213)

    People doing scientific research and software developers are really doing very different things when they write code. For software developers or software engineers, the code is the end goal. They are building a product that they are going to give to others. It should be intuitive to use, robust, produce clear error messages, and be free of bugs and crashes. The code is the product. For someone doing scientific or engineering research, the end goal is the testing an idea, or running an experiment. The code is a means to an end, not the end itself; it needs only to support the researcher, it only needs to run once, and it only needs to be bug free in the cases that are being explored. The product is a graph or chart or sentence describing the results that is put into a paper that gets published; the code itself is just a tool.

    When I got my Ph.D. in the 1990s, I didn't understand this, and it brought be a lot of grief when I went to a research lab and interacted with software developers and managers, who didn't understand this either. The grief comes about because of the different approaches used during the development of each type of code. Software developers describe their process variously as a waterfall model, agile development model, etc.. These processes describe a roadmap, with milestones, and a set of activities that visualize the project at its end, and lead towards robust software development. The process a researcher uses is related to the scientific method: based on the question, they formulate a hypothesis, create an experiment, test it, observe the results, and then ask more questions. They do not always know how things will turn out, and they build their path as they go along. Very often, the equivalent "roadmap" in a researchers mind is incomplete and is developed during the process, because this is part of what is being explored.

    In my organization, this makes tremendous conflict between software developers, who want a careful, process driven model to produce robust code, and researchers, who are seeking to answer more basic questions and explore unknown territory in a way that has a great deal of uncertainty and cannot always easily deliver specific milestones and clarity into schedule that is often desired.

    It is worse when the research results in a useful algorithm; of course, the researcher often wants to make it available to the world so that others can use it. This is more of a grey area; if the researcher knows how to do software engineering, they may go through the process to create a more robust product, but this takes effort and time. The fact that Mozilla wants to help debug scientific code is a very good thing; it often needs more serious debugging and re-architecting than other software that is openly available.

    I wish more people understood this difference.

  • by fygment (444210) on Wednesday September 25, 2013 @09:35AM (#44947333)

    Roger Peng's comment shows a typical, superficial understanding of programming. Ironically, he would be the first to condemn a computer scientist/coder who ventured in to biostatistics with a superficial knowledge of biology. I believe he would feel that anyone can program, but not anyone can do biostatistics. And I deeply disagree. Tools have been provided so that _any_ scientist can code. That does not mean that they understand coding or computer science.

    I have personally experienced that especially in the softer sciences like biology, economy, meteorology, etc., the scientists have absolutely no desire to learn any computer science: coding methodology, testing, complexity, algorithms, etc. The result is kludgy, inefficient code heavily dependent on pre-packaged modules, that produces results that are often a guess; the code produces results but with a lack of any understanding of what the various packaged routines are doing or whether they are appropriate for the task. For example, someone using default settings on a principal component analysis package not understanding that the package expects the user to have pre-processed the data; the output looks fine but it is wrong. It is the same as someone approaching engineering without some understanding of thermodynamics and as a result wasting their time trying to construct a perpetual motion machine.

    • by umafuckit (2980809) on Wednesday September 25, 2013 @10:53AM (#44948307)

      For example, someone using default settings on a principal component analysis package not understanding that the package expects the user to have pre-processed the data; the output looks fine but it is wrong.

      I'm a biologist who learned enough computational stats to get by and I do see what you mean. Initially I did do stuff like that, but over time I put in the effort to learn what's going on and now I hope I make these sorts of dumb mistakes a lot less often! However this is not so much a coding problem, but a stats problem. People in the "soft sciences" don't just have problems with more advanced stuff such as PCA, ICA, clustering, etc, but even simple stats. For example, it's very common to see ANOVA performed on data that would be much better suited to regression analysis. The concept of fitting a line or curve and extracting meaning from the coefficients is rather foreign to a lot of biologists, who are more comfortable with a table full of p-values. Indeed, there is a general fixation on p-values, despite the fact that these are not well understood. There is a tendency to hide raw data (since biological data are often noisy). There is also a tendency to use analyses such as PCA or hierarchical clustering simply to produce fancy plots to blind reviewers; these plots often add no insight (or the insight they might add is not explored).

    • by jhumkey (711391)
      I would add . . . its not just pure "research" with the superficial understanding of programming.

      I've seen personally (and "Dilbert" would seem to confirm as universal) the generalized business belief that . . .

      "programming is easy."
      "quality is easy."
      "expand-ability is easy."
      "maintainability is easy."
      "If I just had a Project Management tool to keep a death grip on delivery time . . . all those other "easy" things will just naturally fall into place."

      I keep thinking the opposite . . .

      Quality
  • by Anonymous Coward

    For the brother-in-law, MD/PhD at local school - he sits on several review boards.

    The biggie is not the code, but the data set. Like to design data sets to test code rather than do code reviews.

    Have also done some code reviews when the b-in-law was not certain. And have found 'bogus' code twice.

    Another (anecdotal) point - all problems found were with life science students. NONE/ZERO/NADA problems with code done by physical sciences or engineering people. Unless you want to count some of the most ugly Python

  • On a related note, the Babel project is getting pushed for Reproducible Research http://orgmode.org/worg/org-contrib/babel/intro.html [orgmode.org]
    It allows code to be embedded in other documents, eg. the LaTeX source of a paper, and executed during rendering.

    Also the Recomputation project is trying to archive scientific code, complete with virtual machines set up to run them http://www.recomputation.org/ [recomputation.org]

  • Researchers are good at researching. They can write some code though.
    Programmers are good at programming. They know how to write good code that is easy to maintain and adapt.

    If you're a researcher with some experience in writing code, you should ask you self, "should I spend that much time writing code, while a programmer does a better job in less time while it has also less bugs, will be reviewed and has unit tests"? Also, how much do you know about design patterns? Sure. Your code works without. Good luck

    • If you're a researcher with some experience in writing code, you should ask you self, "should I spend that much time writing code, while a programmer does a better job in less time while it has also less bugs, will be reviewed and has unit tests"? Also, how much do you know about design patterns? Sure. Your code works without. Good luck with it. Also good luck with the headache in one year.

      It usually doesn't work like that. The researcher does the experiments then analyses and interprets the data. If the latter process requires coding then the researcher does the coding. If a researcher gives up the coding to a programmer (who may have a bad understanding of the science) then they have lost ownership of their data. Besides, there's usually no money to pay a programmer. The only situation where a programmer is called for is in a big lab which needs one or more significant software projects cr

  • Nobody gives a rats rear what some persons code looks like. Code styles are like posterior sphincter muscles, everybody has one. But how about code, and conlusions that are just plane wrong? [nymag.com] If that grad student hadn't checked, just how more damage would go on, try, "it wouldn't stop." I'm beginning to wonder if this couldn't be done using some kind of "blind" study?

Never say you know a man until you have divided an inheritance with him.

Working...