Forgot your password?
typodupeerror
Programming Science

Call For Scientific Research Code To Be Released 505

Posted by Soulskill
from the but-then-people-will-see-how-awful-it-is dept.
Pentagram writes "Professor Ince, writing in the Guardian, has issued a call for scientists to make the code they use in the course of their research publicly available. He focuses specifically on the topical controversies in climate science, and concludes with the view that researchers who are able but unwilling to release programs they use should not be regarded as scientists. Quoting: 'There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies. For example, interface inconsistencies between software modules which pass data from one part of a program to another occurred at the rate of one in every seven interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C. This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program. What he also discovered, even more worryingly, is that the accuracy of results declined from six significant figures to one significant figure during the running of programs.'"
This discussion has been archived. No new comments can be posted.

Call For Scientific Research Code To Be Released

Comments Filter:
  • About time! (Score:5, Informative)

    by sackvillian (1476885) on Tuesday February 09, 2010 @10:50AM (#31071948)

    The scientific community needs to get as far as we can from the policies of companies like Gaussian Inc., who will ban [bannedbygaussian.org] you and your institution for simply publishing any sort of comparative statistics on calculation time, accuracy, etc. from their computational chemistry software.

    I can't imagine what they'd do to you if you started sorting through their code...

  • by stokessd (89903) on Tuesday February 09, 2010 @11:04AM (#31072120) Homepage

    I got my PhD in fluid mechanics funded by NASA, and as such my findings are easily publishable and shared with others. My analysis code (such as it was) was and is available for those would would like to use it. More importantly my experimental data is available as well.

    This represents the classical pure research side of research where we all get together and talk about our findings and there really aren't any secrets. But even with this open example, there are still secrets when it comes to ideas for future funding. You only tip your cards when it comes to things you've already done, not future plans.

    But more importantly, there are whole areas of research that are very closed off. Pharma is a good example. Sure there are lots of peer reviewed articles published and methods discussed, but you'll never really get into their shorts like this guy wants. There's a lot that goes on behind that curtain. And even if you are a grad student with high ideals and a desire to share all your findings, you may find that the rules of your funding prevent you from sharing.

    Sheldon

  • Observations... (Score:5, Informative)

    by kakapo (88299) on Tuesday February 09, 2010 @11:17AM (#31072268)

    As it happens, my students and I are about to release a fairly specialized code - we discussed license terms, and eventually settled on the BSD (and explicitly avoided the GPL), which requires "citation" but otherwise leaves anyone free to use it.

    That said, writing a scientific code can involve a good deal of work, but the "payoff" usually comes in the form of results and conclusions, rather than the code itself. In those circumstances, there is a sound argument for delaying any code release until you have published the results you hoped to obtain when you initiated the project, even if these form a sequence of papers (rather than insisting on code release with the first published results)

    Thirdly, in many cases scientists will share code with colleagues when asked politely, even if they are not in the public domain.

    Fourthly, I fairly regularly spot minor errors in numerical calculations performed by other groups (either because I do have access to the source, or because I can't reproduce their results) -- in almost all cases these do not have an impact on their conclusions, so while the "error count" can be fairly high, the number of "wrong" results coming from bad code is overestimated by this accounting.

  • Re:Conspiracy? (Score:3, Informative)

    by xtracto (837672) on Tuesday February 09, 2010 @11:47AM (#31072688) Journal

    Agreed 100%.

    You would not believe the amount and crappy quality of the code performed during "research projects", specially when the research is in a field completely unrelated to Comp. Sci. or Soft. Eng.

    I have personally seen software related to Agronomy, Biology (Ecology) and Economics. The problem with a lot of that code is that sometimes researchers want to use the power of computers (say, for simulation) but do not know how to code, they then read a bit about some programming language and implement their program s they are learning.

    The result? you can imagine.

  • by khayman80 (824400) on Tuesday February 09, 2010 @11:53AM (#31072784) Homepage Journal
  • Re:About time! (Score:3, Informative)

    by je ne sais quoi (987177) on Tuesday February 09, 2010 @12:11PM (#31073056)
    One thing to point out is that there are now plenty of open source codes available for doing similar things as gaussian so it can be avoided now with relative ease. Two that come to mind are the the Department of Energy funded codes: nwchem [pnl.gov] for ab initio work and lammps [sandia.gov] for molecular dynamics. I use the NIH funded code vmd [uiuc.edu] for visualization. The best part about those codes is that they're designed to be compiled using gcc and run on linux so you can get off the non-open source software train all together if you wish.
  • Re:Seems reasonable (Score:5, Informative)

    by Sir_Sri (199544) on Tuesday February 09, 2010 @12:14PM (#31073116)

    And it's not like the people writing this code are, or were trained in computer science, assuming computer science even existed when they were doing the work.

    Having done an undergrad in theoretical physics, but being in a PhD in comp sci now I will say this: The assumption in physics when I graduated in 2002 was that by second year you knew how to write code, whether they've taught you or not. Even more recently it has still been an assumption that you'll know how to write code, but they try and give you a bare minimum of training. And of course it's usually other physical scientists who do the teaching, not computer scientists, so bad information (or out of date information or the like) is propagated along. That completely misses the advanced topics in computer science which cover a lot more of the software engineering sort of problems. Try explaining to a physicist how a 32 or 64 bit float can't exactly replicate all of the numbers they think it can and watch half of them have their eyes gloss over for half an hour. And then the problem is what do you do about it?

    Then you get into a lab (uni lab). Half the software used will have been written in F77 when it was still pretty new, and someone may have hacked some modifications in here and there over the years. Some of these programs last for years, span multiple careers and so on. They aren't small investments but have had grubby little grad student paws on them for a long time, in addition to incompetent professor hands.

    None of scientific computing is done particularly well, they expect people with no training in software development to do the work, assuming it was done when software development existed, and there isn't the funding to pay people who might do it properly.

    On top of all that it's not like you want to release your code to the public right away anyway. As a scientist you're in competition with groups around the world to publish first. You describe in your paper the science you think you implemented, someone else who wants to verify your results gets to write a new chunk of code which they think is the same science and you compare. Giving out a scientists code for inspection means someone else will have a working software platform to publish papers based on your work, and that's not so good for you. For all the talk of research for the public good, ultimately your own good, of continuing to publish (to get paid) trumps a public need. That's a systematic problem, and when you're competing with a research group in brazil, and you're in canada their rules are different than yours, and so you keep things close to the chest.

  • Re:Seems reasonable (Score:3, Informative)

    by Troed (102527) on Tuesday February 09, 2010 @12:20PM (#31073222) Homepage Journal

    Your comment clearly shows you know nothing about software. I'm able to audit your source code without having a slightest clue as to what domain it's meant to be run in.

  • Re:Seems reasonable (Score:1, Informative)

    by Anonymous Coward on Tuesday February 09, 2010 @12:40PM (#31073590)
    No, universities are not required to make their work public. Congress changed the public domain requirements some time ago, so contractors who are getting federal funds can keep their work secret, patent it, and hold copyright to it. They only are required to provide whatever results the contract demands, and they hold the rights to some of those results.

    And if you're satisfied with how open climate work is, perhaps you can explain the numerous temperature adjustments which are being done. Pick five climate stations and show where the reasons for all the adjustments are published.

  • It's an old story (Score:5, Informative)

    by jc42 (318812) on Tuesday February 09, 2010 @01:14PM (#31074164) Homepage Journal

    This is hugely worrying when you realise that just one error -- just one -- will usually invalidate a computer program.

    Back in the 1970s, a bunch of CompSci guys at the university where I was a grad student did a software study with interesting results. Much of the research computing was done on the university's mainframe, and the dominant language of course was Fortran. They instrumented the Fortran compiler so that for a couple of months, it collected data on numeric overflows, including which overflows were or weren't detected by the code. They published the results: slightly over half the Fortran jobs had undetected overflows that affected their output.

    The response to this was interesting. The CS folks, as you might expect, were appalled. But among the scientific researchers, the general response was that enabling overflow checking slowed down the code measurably, so it shouldn't be done. I personally knew a lot of researchers (as one of the managers of an inter-departmental microcomputer lab that was independent of the central mainframe computer center). I asked a lot of them about this, and I was appalled to find that almost every one of them agreed that overflow checking should be turned off if it slowed down the code. The mainframe's managers reported that almost all Fortran compiles had overflow checking turned off. Pointing out that this meant that fully half of the computed results in their published papers were wrong (if they used the mainframe) didn't have any effect.

    Our small cabal that ran the microprocessor lab reacted to this by silently enabling all error checking in our Fortran compiler. We even checked with the vendor to make sure that we'd set it up so that a user couldn't disable the checking. We didn't announce that we had done this; we just did it on our own authority. It was also done in a couple of other similar department-level labs that had their own computers (which was rare at the time). But the major research computer on campus was the central mainframe, and the folks running it weren't interested in dealing with the problem.

    It taught us a lot about how such things are done. And it gave us a healthy level of skepticism about published research data. It was a good lesson on why we have an ongoing need to duplicate research results independently before believing them.

    It might be interesting to read about studies similar to this done more recently. I haven't seen any, but maybe they're out there.

  • by azgard (461476) on Tuesday February 09, 2010 @01:17PM (#31074216)

    While I am fan of open source and this idea in general, for climatology, this is a non-issue. Look there: http://www.realclimate.org/index.php/data-sources/ [realclimate.org]

    It's more code out there than one amateur can eat for life. And you know what? From the experience of people who wrote these programs, there isn't actually much people looking at it. I doubt that any scientific code will get many eyeballs. This is more a PR exercise.

  • Re:Seems reasonable (Score:4, Informative)

    by khellendros1984 (792761) on Tuesday February 09, 2010 @02:19PM (#31075208) Journal
    It may not be possible to actually "discuss" the topic, but it's certainly possible to find bugs that may or may not influence the output of the program. And given the original input data, it's possible to remove the bugs, run the corrected program against the original input data, and see if the output is different. It would take someone with knowledge in the target topic to analyze the output data and decide if any difference is significant, but the actual check for bugs could certainly be done by anyone that "speaks" the language the program was written in.

    Even with something like a Global Warming argument, a person with a strong grasp of both English and logic might not be able to verify claims in an argument, but they can certainly analyze the argument for certain logical fallacies. Perhaps the fallacious section of the argument doesn't invalidate the argument as a whole. You can't trust this generic English-speaker to accurately make that determination, but they're certainly able to identify and remove a strawman, an ad hominem, etc.
  • Re:Seems reasonable (Score:2, Informative)

    by STRICQ (634164) on Tuesday February 09, 2010 @02:24PM (#31075296)
    Careful, you are getting dangerously close to the conceited, "Holier than thou" attitude that many climate scientists are spewing out. You really don't know what you're talking about when you say the op doesn't know what he's talking about. I'm a software engineer, finding bugs, even when you don't know what the code is doing, is a lot easier than you would think.
  • by bunratty (545641) on Tuesday February 09, 2010 @02:38PM (#31075552)

    On prediction failure, Hansen's 1988 "A,B,C" forecasts of rising temperature are rapidly diverging from the cooling we are actually experiencing right now, where case C assumed we massively limited CO2 also

    In the past ten years, we've seen warming of 0.18 degrees Celsius [usatoday.com], which is less than the 0.25 degrees Celsius that was predicted, but it certainly hasn't been cooling. This is why the Arctic ice [nasa.gov] and Antarctic ice [nasa.gov] are melting. Yes, stop the presses, the globe is warming!

After an instrument has been assembled, extra components will be found on the bench.

Working...