Forgot your password?
typodupeerror
Mozilla Programming Science

Mozilla Plan Seeks To Debug Scientific Code 115

Posted by Soulskill
from the unit-tests-are-for-undergrads dept.
ananyo writes "An offshoot of Mozilla is aiming to discover whether a review process could improve the quality of researcher-built software that is used in myriad fields today, ranging from ecology and biology to social science. In an experiment being run by the Mozilla Science Lab, software engineers have reviewed selected pieces of code from published papers in computational biology. The reviewers looked at snippets of code up to 200 lines long that were included in the papers and written in widely used programming languages, such as R, Python and Perl. The Mozilla engineers have discussed their findings with the papers’ authors, who can now choose what, if anything, to do with the markups — including whether to permit disclosure of the results. But some researchers say that having software reviewers looking over their shoulder might backfire. 'One worry I have is that, with reviews like this, scientists will be even more discouraged from publishing their code,' says biostatistician Roger Peng at the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland. 'We need to get more code out there, not improve how it looks.'"
This discussion has been archived. No new comments can be posted.

Mozilla Plan Seeks To Debug Scientific Code

Comments Filter:
  • Wrong objective. (Score:5, Insightful)

    by smart_ass (322852) on Wednesday September 25, 2013 @12:24AM (#44944655)

    I don't know the actual objective ... but if the concern is "'We need to get more code out there, not improve how it looks.'" ... the objective is bad.

    Wouldn't shouldn't this be about catching subtle logic / calculation flaws that lead to incorrect conclusions?

    Agree ... if this is about indenting and which method of commenting ... then yeah ... bad idea.

    But this has the possibility of being so much more. I would see it as free editing by qualified people. Seems like a deal.

  • Hell Yes! (Score:5, Insightful)

    by Garridan (597129) on Wednesday September 25, 2013 @12:28AM (#44944673)
    Where do I sign up? If I could get a "code reviewed by third party" stamp on my papers, I'd feel a lot better about publishing the code and the results derived from it. Maybe mathematicians are weird like that -- I face stigma for using a computer, so anything I can do to make it look more trustworthy is awesome.
  • Re:Hell Yes! (Score:5, Insightful)

    by JanneM (7445) on Wednesday September 25, 2013 @12:47AM (#44944733) Homepage

    Problem is, at least in this trial they're reviewing already published code, when it's too late to gain much benefit from the review on the part of the original writer. A research project is normally time-limited after all; by the time the paper and data is public, the project is often done and people have moved on.

    There's nobody with the time or inclination to, for instance, create and release a new improved version of the code at that point. And unless there's errors which lead to truly significant changes in the analysis, nobody would be willing to publish any kind of amended analysis either.

  • by dcollins (135727) on Wednesday September 25, 2013 @01:44AM (#44944947) Homepage

    Yeah, it seems like the real objective should be to get more code read and verified as part of the scientific process. (Just "getting more code out there" and expecting it to go unread would be pretty empty.)

    One problem is that the publish-or-perish process has gotten sufficiently corrupt that many results are irreproducible, PhD students are warned against trying to reproduce results, and everyone involved has lost the expectation that their work will be experimentally double-checked.

  • by icebike (68054) on Wednesday September 25, 2013 @02:04AM (#44945023)

    Well running The ORIGINAL author's code isn't that important.

    What's important is the analysis that the code was supposed to do.

    Describing that in mathematical terms and letting anyone trying to replicate the research is better than handing the original code forward. That's just passing another potential source of error forward.

    Most of the (few) research projects I been called to help with coding on are strictly package runners. Only a one had anything approaching custom software, and it was a mess.

  • by ralphbecket (225429) on Wednesday September 25, 2013 @02:34AM (#44945131)

    I have to disagree. Before I go to a heap of effort reproducing your experiment, I want to check that the analysis you ran was the one you described in your paper. After I've convinced myself that you haven't made a mistake here, I may then go and try your experiment on new data, hopefully thereby confirming or invalidating your claims. Indeed, by giving me access to your code you can't then claim that I have misunderstood you if I do obtain an invalidating result.

  • by mwvdlee (775178) on Wednesday September 25, 2013 @03:18AM (#44945291) Homepage

    I think that's exactly the opposite of the point the GP was trying to make.

    If it looks like bad PHP from 10 years ago but contains no bugs, then that is completely okay.
    If it looks like old COBOL strung together with GO TO's and it works, it's okay.
    If it looks like perfect C++ code but contains bugs, the bugs needs to be exposed, especially so if the research results are based on the output of the code.

  • by Anonymous Coward on Wednesday September 25, 2013 @09:23AM (#44947179)

    This is a logical fallacy that many 'smart' people fall into. I am smart (in this case usually PhD's or people on their way to it) so this XYZ thing should be no sweat. They seem to forget that they spent 10-15 years becoming very good at whatever they do. Becoming a master of it. Yet somehow they also believe they can use this mastery on other things. In some very narrow cases you can do this. But many times you can not. Or even worse assuming no one else can understand what you are doing or they will 'get it wrong'.

    When the right thing to do is find another master in that other field. Even that is dangerous. You will also see many out there who then follow in the footsteps of these 'know it all' masters. Yelling the word 'science' at anyone who disagrees. Disagreeing is not because they think you are wrong (maybe you are), but because they do not understand.

    In this case writing code is *easy*, writing good code takes work. Even those who are masters at it make mistakes. We call them bugs. Even when you are good at it you still work at making it correct, even if you do it just because you have 'been there'. There are whole books out there on anti-patterns, patterns, development style, code philosophy, etc. From my POV it usually takes someone about 2 years to become somewhat 'ok' at programming. Somewhere in the 5-10 year mark they become masters. Then that is if they do it every day.

  • by ebno-10db (1459097) on Wednesday September 25, 2013 @09:46AM (#44947469)

    If it looks like bad PHP from 10 years ago but contains no bugs, then that is completely okay.
    If it looks like old COBOL strung together with GO TO's and it works, it's okay.
    If it looks like perfect C++ code but contains bugs, the bugs needs to be exposed, especially so if the research results are based on the output of the code.

    None of the above. It's scientific code. It looks like bad Fortran (or even worse, FORTRAN) from 20 years ago, which is ok, since Fortran 90 is fine for number crunching.

    In all seriousness, my experience is that "Ph.D. types" (for want of a better term) write some of the most amateurish code I've ever seen. I've worked with people whose knowledge and ability I can only envy, and who are anything but ivory tower types, but write code like it was BASIC from a kindergartener (ok, today's kindergarteners probably write better code than in my day). Silly things like magic numbers instead of properly defined constants (and used in multiple places no less!), cut-and-paste instead of creating functions, hideous control structures for even simple things. Ironically, this is despite the fact that number crunching code generally has a simple code structure and simple data structures. I think bad code is part of the culture or something. The downside is that it makes it more likely to have bugs, and very difficult to modify.

    Realistically, this is because they're judged on their results and not their code. To many people here, the code is the end product, but to others it's a means to an end. Better scrutiny of it though would lead to more reliable results. It should be mandatory to release the entire program within, say, 1 year of publication. As for it being obfuscated, intentionally or otherwise, I don't think there's much you can do about that.

"Let every man teach his son, teach his daughter, that labor is honorable." -- Robert G. Ingersoll

Working...