Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
Programming Science

Call For Scientific Research Code To Be Released 505

Posted by Soulskill
from the but-then-people-will-see-how-awful-it-is dept.
Pentagram writes "Professor Ince, writing in the Guardian, has issued a call for scientists to make the code they use in the course of their research publicly available. He focuses specifically on the topical controversies in climate science, and concludes with the view that researchers who are able but unwilling to release programs they use should not be regarded as scientists. Quoting: 'There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies. For example, interface inconsistencies between software modules which pass data from one part of a program to another occurred at the rate of one in every seven interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C. This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program. What he also discovered, even more worryingly, is that the accuracy of results declined from six significant figures to one significant figure during the running of programs.'"
This discussion has been archived. No new comments can be posted.

Call For Scientific Research Code To Be Released

Comments Filter:
  • Stuff like Sweave (Score:4, Interesting)

    by langelgjm (860756) on Tuesday February 09, 2010 @09:49AM (#31071944) Journal

    Much quantitative academic and scientific work could benefit from the use of tools like Sweave, [wikipedia.org] which allows you to embed the code used to produce statistical analyses within your LaTeX document. This makes your research easier to reproduce, both for yourself (when you've forgotten what you've done six months from now) and others.

    What other kinds of tools like this are /.ers familiar with?

  • by bramp (830799) on Tuesday February 09, 2010 @09:50AM (#31071946)
    I've always been a big fan of releasing my academic work under a BSD licence. My work is funded by the taxpayers, so I think the taxpayers should be able to do what they like with my software. So I fully agree that all software should be released. It is not always enough to just publish a paper, but you should release your code so others can fully review the accuracy of your work.
  • by BoRegardless (721219) on Tuesday February 09, 2010 @09:51AM (#31071980)

    One significant figure?

  • That's all wrong (Score:3, Interesting)

    by Gadget_Guy (627405) * on Tuesday February 09, 2010 @10:06AM (#31072140)

    The scientific process is to invalidate a study if the results cannot be reproduced by anyone else. That way you can eliminate all potential problems like coding errors, invalid assumptions, faulty equipment, mistakes in procedures, and 100 of the other things that can produce dodgy results.

    It can be misleading to search through the code for mistakes when you don't know which code was eventually used in the final results (or in which order). I have accumulated quite a lot of snipits of code that I used to fix a particular need at the time. I am sure that many of these hacks were ultimately unused because I decided to go down a different path in data processing. Or the temporary tables used during processing is no longer around (or in a changed format since the code was written). There is also the problem of some data processing being done by commercial products.

    It's just too hard. The best solution is to let science work the way it has found to be the best. Sure you will get some bad studies, but these will eventually be fixed over time. The system does work, whether vested interests like it or not.

  • I concur (Score:5, Interesting)

    by dargaud (518470) <[ten.duagradg] [ta] [2todhsals]> on Tuesday February 09, 2010 @10:16AM (#31072260) Homepage
    As a software engineer who has spent 20 years coding in research labs, I can say with certainty that the code written by many, if not most, scientists is utter garbage. As an example, a colleague of mine was approached recently to debug a piece of code: "Oh, it's going to be easy, it was written by one of our postdocs on his last day here...". 600 lines of code in the main, no functions, no comments. He's been at it for 2 months.

    I'm perfectly OK with the fact that their job is science and not coding, but would they go to the satellite assembly guys and start gluing parts at random ?

  • by FlyingBishop (1293238) on Tuesday February 09, 2010 @10:19AM (#31072290)

    Back in college, I did some computer vision research. Most people provided open source code for anyone to use. However, aside from the code being of questionable quality, it was mostly written in Matlab with C handlers for optimization.

    In order to properly test all of the software out there you would need:

    1. A license for every version of Matlab.
    2. Windows
    3. Linux
    4. Octave

    I had our school's Matlab, but none of the code we found was written on that version. Some was Linux, some Windows, (the machine I had was a Windows box with Matlab) consequently we had to play with Cygwin...

    I mean, basically, you need to distribute a straight-up VM if you want your results to be reproducible. (which naturally rules out Windows or Matlab or anything else proprietary being at the core.)

  • by DoofusOfDeath (636671) on Tuesday February 09, 2010 @10:22AM (#31072344)

    I'm working on my dissertation proposal, and I'd like to be able to re-run the benchmarks that are shown in some of the papers I'm referencing. But must of the source code for those papers has disappeared into the aether. Without their code, it's impossible for me to rerun the old benchmark programs on modern computers so that I and others can determine whether or not my research has uncovered a better way of doing things. This is very far from the idealized notion of the scientific method, and significantly calls into question many of the things that we think we know based on published research.

  • by quadelirus (694946) on Tuesday February 09, 2010 @10:22AM (#31072348)
    Unfortunately computer science is pretty closed off as well. Too few projects end up in freely available open code. It hinders advancement (because large departments and research groups can protect their field of study from competition by having a large enough body of code that nobody else can spend the 1-2 years required to catch up) and it hinders verifiability (because they make claims on papers about speed/accuracy/whatever and we basically have to stake it on their word and reputation and whether it SEEMS plausible--this also means that surprising results from lesser known researchers might be less likely to get published).

    I think it our duty as scientists to ALWAYS release the code, even if it is uncommented and unclean. I'm very glad to be researching under an advisor who requires that we always release our code as open source after papers have been published so that other groups can build on what we've done. This should absolutely be universal.
  • by natoochtoniket (763630) on Tuesday February 09, 2010 @10:30AM (#31072452)

    That actually surprised me, too. Loss of precision is nothing new. When you use floats to do the arithmetic, you lose precision in each operation, and particularly when you multiply two numbers with different scales (exponents). The thing that surprised me was not that a calculation could lose precision. It was the assertion that any precision would remain, at all.

    Numeric code can be written using algorithms that minimize loss of precision, or that are able to quantify the amount of precision that is lost (and that remains) in the final answers. But, if you don't use those algorithms, or don't use them correctly and carefully, you really cannot assert _any_ precision in the result.

    If you know your confidence interval, you can state your result with confidence. But, if you don't bother to calculate the confidence interval, or if you don't know what a CI is, or if you are not careful, it usually ends up being plus-or-minus 100 percent of the scale.

  • by Anonymous Coward on Tuesday February 09, 2010 @10:36AM (#31072544)

    What about McIntyre's faulty data?

    Ah, no FOIA there, because he's toeing the party line.

    Note: He's not the only denial ditto who refuses to release his code:

    http://www.realclimate.org/index.php/archives/2009/12/please-show-us-your-code/ [realclimate.org]

    Oh, the meeja is quiet about that, isn't it...

  • by AlXtreme (223728) on Tuesday February 09, 2010 @10:41AM (#31072594) Homepage Journal

    that scientists outside of computer science are too busy in their respective fields to know anything about code, or even care.

    If their code results in predictions that affect millions of lives and trillions of dollars, perhaps they should learn to care.

    What I've personally seen of scientists is a frantic determination to publish papers anywhere and everywhere, no matter how well-founded the results in those papers are. The IPCC-gate is merely a symptom of a deeper problem within scientific research.

    If scientists are too busy because of publication quota's and funding issues to focus on delivering proper scientific research, maybe we should question our current means of supporting scientific research. Currently we've got quantity, but very little quality.

  • Not that simple (Score:4, Interesting)

    by khayman80 (824400) on Tuesday February 09, 2010 @10:47AM (#31072694) Homepage Journal

    I'm finishing a program that inverts GRACE data to reveal fluctuations in gravity such as those caused by melting glaciers. This program will eventually be released as open source software under the GPLv3. It's largely built on open source libraries like the GNU Scientific Library, but snippets of proprietary code from JPL found their way into the program years ago, and I'm currently trying to untangle them. The program can't be made open source until I succeed because of an NDA that I had to sign in order to work at JPL.

    It's impossible to say how long it will take to banish the proprietary code. While working on this project, my research is at a standstill. There's very little academic incentive to waste time on this idealistic goal when I could be increasing my publication count.

    Annoyingly, the data itself doesn't belong to me. Again, I had to sign an NDA to receive it. So I can't release the data. This situation is common to scientists in many different fields.

    Incidentally, Harry's README file is typical of my experiences with scientific software. Fragile, unportable, uncommented spaghetti code is common because scientists aren't professional programmers. Of course, this doesn't invalidate the results of that code because it's tested primarily through independent verification, not unit tests. Scientists describe their algorithms in peer-reviewed papers, which are then re-implemented (often from scratch) by other scientists. Open source code practices would certainly improve science, but he's wrong to imply that a single bug could have a significant impact on our understanding of the greenhouse effect.

  • Re:Stuff like Sweave (Score:2, Interesting)

    by xtracto (837672) on Tuesday February 09, 2010 @10:51AM (#31072756) Journal

    Should there be a universal language,

    It is called Z notation [wikipedia.org]. I have seen it used in several articles and at least a book on multi-agent systems.

  • by harvey the nerd (582806) on Tuesday February 09, 2010 @10:53AM (#31072780)
    Real scientists don't use simulators with incomplete equations and fudge factors to match highly manipulated historic data to "prove" their case with game machines that have no predictive capability or other external validation. That simply is not the way you build a valid fundamentals based model starting from the equations of motion. IPCC reports previously noted whole terms in the equations' energy terms that were inadequately described or represented, then have done no research to fill the terms, modellers just zeroing them out or putting in small constants for significant *variables*. These are not real scientists, their processes and practices have been clearly shown to be antithetical to valid science.

    These models are just primitive speculative tools, often reflecting personal biases in data selection and derivation, NOT fundamental equations. The models are NOT valid physics data or experiments.

    On prediction failure, Hansen's 1988 "A,B,C" forecasts of rising temperature are rapidly diverging from the cooling we are actually experiencing right now, where case C assumed we massively limited CO2 also. Missed the side of a barn with a shotgun, tsk, tsk, tsk.
  • by Anonymous Coward on Tuesday February 09, 2010 @10:55AM (#31072822)

    NIH funding standards promote commercialization of publicly funded software. This appears to have been implemented before the modern internet, and the idea may have been that a commercial product would make the code more available, and perhaps fix some of the quality issues with code cobbled together by "non-programmers". The result is that companies like Accelrys own a huge amount of software developed under public funding. Now, the public has to pay to use software trhat they paid to develop, and it is impossible for other scientific research to extend that publicly funded effort.

    I want to see an NIH version of SourceForge, and mandate all government funded software development to be stored there. Unlike SourceForge, there could be delayed release to the public so that researchers have time to publish their work.

  • by crmarvin42 (652893) on Tuesday February 09, 2010 @11:01AM (#31072914)

    1) Do you seriously think that the whole climate science depends on one scientist's data?

    No, but his work does include suggestions that regulators pay close attention to based on his status within the community. If he were posting on this very same topic, but was not being used as a primary source by regulators then I could see your point. However, that is not the case and theoretical situations are not really relevant.

    2) CRU was trolled by FOIA requests. They are nuisance to deal with, as far as I was told.

    Then hire someone to handle them for you, or have grad students do it.

    I could say the same thing about publishing and peer review. It's a major PITA to get formatting done just right, making sure that those outside of my small sphere of research can understand what I did without getting lost in all of the jargon. Suck it up! It is an unfortunate, but necessary part of doing research at a public institution.

    3) Scientists are people, people have emotions. That's why peer review is used.

    Not sure what this has to do with anything. Peer review is valuable and necessary, but it has never pretended to be about accuracy of the data. It's about cleaning up the presentation so that it is clear, reproducible, and free from OBVIOUS error.

    As a reviewer, I don't know what exactly was done, but if a list of numbers that should add up to 100 instead adds up to 120, then I can catch that. Whether the problem is due to a typo, or sloppy data fabrication, or a computer error is not something I can ascertain. I have to trust that the authors explanation and fix are true and accurate, in which case I am trusting that they are honest, competent and attentive. The more of their data and methodology that they expose to scrutiny, the less faith I have to have and the more I can ascertain for myself directly.

  • Re:great! (Score:1, Interesting)

    by Anonymous Coward on Tuesday February 09, 2010 @11:11AM (#31073062)

    Yet more stupidity by people who know nothing. By the logic of the posters above since the government paid EDS under a contract to perform some work for the government ANY software developed by EDS should suddenly become freely available. Since Boeing was paid to develop tF-15 ANY software written to design or build the f-15 should now be freely available. I suggest you idiots try that with any company that does business with the government. I'll know when you do because the gales of laughter from the corporate offices will be heard around the world. Yet you think that software developed to perform a work under contract to the government by a researcher mysteriously should freely available. The funny part is if the software written by researchers under contract to government wasn't freely available how is it possible that the idiot writing for the Guardian was able to perform the analysis? Wups! I suggest he try the same analysis on the regenerative breaking software that Toyota has on its Prius or maybe the Airbus 330 fly-by-wire software (think AirFrance) Wups!

    Yes I am very aware of the legal requirements and their consequences. For the third I've have large chunks of my code copied verbatim within commercial products after my institution was forced release it to companies repeating the same "the research is publicly funded." line of bullshit. The companies actually had the brass balls to actually try to sell me my own software. Yes it's a long protracted process to get the companies to either remove the offending code or pay the university for it.

  • by Jaydee23 (1741316) on Tuesday February 09, 2010 @11:11AM (#31073068)
    Code should be release, but this should not be confused with replicating scientific results. ] If you want to replicate research, you need to write your own code according to the methods described in the research. your answer then needs to match the original to test the code.
  • by c_sd_m (995261) on Tuesday February 09, 2010 @11:35AM (#31073486)

    What I've personally seen of scientists is a frantic determination to publish papers anywhere and everywhere, no matter how well-founded the results in those papers are. The IPCC-gate is merely a symptom of a deeper problem within scientific research.

    They're trained for years on a publish or perish doctrine. Either they have enough publications or they get bounced out of academia at some threshold (getting into a PhD, getting a post doc, getting a teaching position, getting tenure, ...). Under that pressure you end up with just the people who churn out lots of papers making it into positions of power. In some fields you're also expected to pull in significant research funding and there are few opportunities to do some without corporate partnerships. So if you're going to fund students to publish papers, you need to accept limits on what you can publish. The only alternative is to leave the field.

    There's no shortage of problems with the research community these days.

  • by phantomfive (622387) on Tuesday February 09, 2010 @11:36AM (#31073524) Journal

    people increasingly don't seem to understand the very idea of scientific methods or processes, or the reasoning behind empiricism and careful management of precision.

    Surprisingly, not true. In fact, it's getting better, despite what Idiocracy [xkcd.com] claims.

    An easy way to see this is to compare High School Musical to Grease. Both of them were roughly the same movie, separated by a few decades. In Grease, the smart kids were shown as dorks, and the cool kids were the ones who were most likely to drop out of school. In High School Musical, the 'brainiac' kids weren't portrayed as better or worse than the jocks, just different. So perceptions are changing.

    Mainly I don't know when this golden time period was that everyone understood formal and informal logic, epistemology, and ontology. At least now, most everyone understands [citation needed], twenty years ago most people had trouble with that (and Wikipedia spoils me: now when I read a newspaper I keep wanting to find the link to click on for the citation of their assertions.).

  • by ArcherB (796902) on Tuesday February 09, 2010 @11:57AM (#31073900) Journal

    Scientists need to realize that if they're going to get public support, they really need to be very careful with their choice of wording. Like it or not, the scare mongers, and I mean scare mongers in the sense that there are people who are trying to scare folks into believing that Global Warming is some sort of wealth redistribution scheme by the socialists, are going to use any hint, real or not, that scientists are making up their findings.

    Scare mongers? Let's take a look at some of these "hints" that scientists are making up their findings. From May 7, 2002 [nationalgeographic.com]

    Dozens of mountain lakes in Nepal and Bhutan are so swollen from melting glaciers that they could burst their seams in the next five years and devastate many Himalayan villages, warns a new report from the United Nations.

    From January 17, 2010 [timesonline.co.uk]:

    In the past few days the scientists behind the warning have admitted that it was based on a news story in the New Scientist, a popular science journal, published eight years before the IPCC's 2007 report.

    It has also emerged that the New Scientist report was itself based on a short telephone interview with Syed Hasnain, a little-known Indian scientist then based at Jawaharlal Nehru University in Delhi.

    Hasnain has since admitted that the claim was "speculation" and was not supported by any formal research.

    Do I need to pull the quotes that claim NY and Florida will be underwater?

    As for the "fear mongers" saying that GW is a socialist wealth redistribution [nytimes.com] scheme.

    Some officials from the United States, Britain and Japan say foreign-aid spending can be directed at easing the risks from climate change. The United States, for example, has promoted its three-year-old Millennium Challenge Corporation as a source of financing for projects in poor countries that will foster resilience. It has just begun to consider environmental benefits of projects, officials say.

    Industrialized countries bound by the Kyoto Protocol, the climate pact rejected by the Bush administration, project that hundreds of millions of dollars will soon flow via that treaty into a climate adaptation fund.

    Strange. When did Rush and Hannity start writing for the NY Times?

  • Re:Seems reasonable (Score:3, Interesting)

    by Pinky's Brain (1158667) on Tuesday February 09, 2010 @12:02PM (#31073970)

    Lets assume for a moment you publish your code in the reproducible research sense, this will mean you also publish all the code necessary to compute the graphs in your papers ... at that point I can at the very least determine if what you thought was significant in the initial results as explained in your papers is still there.

  • Re:Seems reasonable (Score:3, Interesting)

    by professionalfurryele (877225) on Tuesday February 09, 2010 @12:05PM (#31074022)

    Having worked in academia I can attest to the very poor code quality, at least in my area. The reason is very simple, the modern research scientists is often a jack-of-all-trades. A combination IT professional, programmer, teacher, manager, recruiter, bureaucrat, hardware technician, engineer, statistician, author and mathematician as well as being an expert in the science of their own field. Any one of these disciplines would take years of experience to develop professional skills at. Most scientists simply don't have time to do that, so they wing it. I think publishing code would be a good idea as scrutiny would help quality, but a big chunk of this code is never going to be of professional quality because it isn't written by professional programmers.

  • Re:Conspiracy? (Score:3, Interesting)

    by pavon (30274) on Tuesday February 09, 2010 @12:14PM (#31074162)

    Yes, there are stubborn idiots that will believe what they want regardless of the evidence. There are self-entitled people that complain no matter how good of a service you provide. There are unreasonable assholes in this world.

    However, since nothing I do will appease them, why should I give a moments consideration to them whatsoever? I am going to base my actions on what will best convince/serve the reasonable people, on top of what makes the best science. Hiding data and and not being responsive to criticisms is counterproductive to those goals.

    Case in point. The recent inclusion of data that had not been peer reviewed in the IPCC report didn't convince me that everything in the report was garbage, but it meant that everything in there had to be weighed on it's own merits, as I couldn't trust the vetting process done by the IPCC. It didn't discredit climate change itself, but it did undermine the ability of the IPCC to act as a credible distiller of the state of climate change research.

    These are the issues that you need to be concerned about, not how the ideologues and pundits are going to react.

  • by Troed (102527) on Tuesday February 09, 2010 @01:13PM (#31075080) Homepage Journal

    Wrong. You're taking two separate issues and try to claim that since there are two one is irrelevant. Of course it's not - both are. However, verifying the model DOES take more domain knowledge than verifying the implementation. We're currently discussing verifying the implementation, which is still important.

  • Re:Seems reasonable (Score:2, Interesting)

    by Explodo (743412) on Tuesday February 09, 2010 @01:29PM (#31075398)
    You seem to assume that your code is correct. What if, by allowing others to audit it, bugs were found that significantly altered the output. Wouldn't that be something that you'd be interested in? Or, what if you spent years working on your doctoral thesis but at the last second found an error in your software that was what allowed your results to be in line with your assumptions and theory work? Would you scrap your years of work, or would you ignore it since you're freaking tired of working on it and want to be done already? Now assume that the results of your work are used to set public policy somewhere down the road...would you be honest enough to stand up and say it was fraudulent?
  • Re:Seems reasonable (Score:3, Interesting)

    by CptNerd (455084) <adiseker@lexonia.net> on Tuesday February 09, 2010 @04:30PM (#31078274) Homepage
    What it does is, it eliminates one possible cause of errors. Software that doesn't do bounds checking, for instance, is like uncalibrated measuring instruments. Writing 100 numbers into an array of 10 integers will cause 90 numbers to be written into random areas of memory, and you can't be guaranteed that they aren't affecting other parts of your model, including parts that have been calculated previously and which are now overwritten by false values. I saw something just like this when converting a legacy communications package from Fortran to C, all through the code the previous programmers had defined 16 character strings and were writing 256 characters into them, due to a change in one constant that wasn't used to define the array bounds. Fortunately the problem caused the C code to crash, but the Fortran code would occasionally produce strange results, caused by this coding error.

    I've been a programmer for 30 years, and a science geek for longer than that, and I would assume that a scientist would be in favor of eliminating as many potential errors as possible in the instruments they use, whether the instruments are hardware or software.
  • Re:Seems reasonable (Score:3, Interesting)

    by mathfeel (937008) on Tuesday February 09, 2010 @04:37PM (#31078392)

    You argument is void. A bug is a bug. Either it affects the outcome of the program run or it doesn't - and I still don't need to know anything about what it's supposed to do to verify that. You just need to re-run the program with a specified set of inputs and check the output - also known as verified against its own test suite.

    Unlike many pure-software case, scientific simulation can and MUST be checked against theory/simplified model/asymptotic behavior. The latter requires specialized understanding of the underlying science. The kind of coding bug you are talking about will usually (not always) result in damningly unphysical result, which would the immediate prompt any good student to check for the said bug. Heck, my boss usually refuse to look at my codes when I am working on it (besides advising on general structure) so that even if my code got the expected result, he can still perform an independent inspection.

  • Re:Seems reasonable (Score:3, Interesting)

    by quanticle (843097) on Tuesday February 09, 2010 @05:06PM (#31078808) Homepage

    A more important concern is that someone else who does have your background should have access to your code. That would be part of "peer review". Otherwise they're taking your computations on faith, with no way to reproduce.

    I fully agree. Perhaps something that scientific journals could do is to create a source code repository that allows researchers to publish the source code used to create the results along with the results themselves. At the very least, other researchers would be able to look at the code and see if there are any glaring errors or omissions.

  • Re:Seems reasonable (Score:1, Interesting)

    by Anonymous Coward on Wednesday February 10, 2010 @06:00AM (#31083824)

    Wrong, actually. If it had been the CFCs, it would've taken another decade to see improvements. The most likely scenario is that the ozone hole (which we have no idea how it behaved until we first looked discovered it in the 50/60s) is created by UV and/or cosmic radition, and thus modulated by the solar cycle.

    Yes, there's a really nice correlation there, with a suggested causality.

  • Re:Seems reasonable (Score:3, Interesting)

    by TheTurtlesMoves (1442727) on Wednesday February 10, 2010 @06:22AM (#31083952)
    And when your code output does not match theirs, its a bug in your code... because you know, we know its not a bug in our code. Trust me! To replicate the results code should be available. Its is a requirement to provide the source in many journals already.

    Science does not require trust. It requires transparency. Closed source is not transparent.

Science and religion are in full accord but science and faith are in complete discord.

Working...