Forgot your password?
typodupeerror
Programming Science

Call For Scientific Research Code To Be Released 505

Posted by Soulskill
from the but-then-people-will-see-how-awful-it-is dept.
Pentagram writes "Professor Ince, writing in the Guardian, has issued a call for scientists to make the code they use in the course of their research publicly available. He focuses specifically on the topical controversies in climate science, and concludes with the view that researchers who are able but unwilling to release programs they use should not be regarded as scientists. Quoting: 'There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies. For example, interface inconsistencies between software modules which pass data from one part of a program to another occurred at the rate of one in every seven interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C. This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program. What he also discovered, even more worryingly, is that the accuracy of results declined from six significant figures to one significant figure during the running of programs.'"
This discussion has been archived. No new comments can be posted.

Call For Scientific Research Code To Be Released

Comments Filter:
  • Seems reasonable (Score:4, Insightful)

    by NathanE (3144) on Tuesday February 09, 2010 @10:44AM (#31071878)

    Particularly if the research is publicly funded.

    • by fuzzyfuzzyfungus (1223518) on Tuesday February 09, 2010 @11:19AM (#31072286) Journal
      The "The public deserves access to the research it pays for" position seems so self-evidently reasonable that further debate is simply unnecessary(though, unfortunately, the journal publishers have a strong financial interest in arguing the contrary, so the "debate" actually continues, against all reason). Similarly, the idea that software falls somewhere in the "methods" section and is as deserving of peer review as any other part of the research seems wholly reasonable. Again, I suspect that getting at the bits written by scientists, with the possible exception of the ones working in fields(oil geology, drug development, etc.) that also have lucrative commercial applications, will mainly be a matter of developing norms and mechanisms around releasing it. Academic scientists are judged, promoted, and respected largely according to how much(and where) they publish. Getting them to publish more probably won't be the world's hardest problem. The more awkward bit will be the fact that large amounts of modern scientific instrumentation, and some analysis packages, include giant chunks of closed source software; but are also worth serious cash. You can absolutely forget getting a BSD/GPL release, and even a "No commercial use, all rights reserved, for review only, mine, not yours." code release will be like pulling teeth.

      On the other hand, I suspect some of this hand-wringing of being little more than special pleading. "This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program." Right. I know that I definitely live in the world where all my important stuff: financial transactions, recordkeeping, product design, and so forth are carried out by zero-defect programs, delivered to me over the internet by routers with zero-defect firmware, and rendered by a variety of endpoint devices running zero-defect software on zero-defect OSes. Yup, that's exactly how it works. Outside of hyper-expensive embedded stuff, military avionics, landing gear firmware, and FDA approved embedded medical widgets(that still manage to Therac people from time to time), zero-defect is pure fantasy. A very pleasant pure fantasy, to be sure; but still fantasy. The revelation that several million lines of code, in a mixture of Fotran and C, most likely written under time and budget constraints, isn't exactly a paragon of code quality seems utterly unsurprising, and utterly unrestricted to scientific areas. Code quality is definitely important, and science has to deal with the fact that software errors have the potential to make a hash of their data; but science seems to attract a whole lot more hand-wringing when its conclusions are undesirable...
      • by apoc.famine (621563) <`apoc.famine' `at' `gmail.com'> on Tuesday February 09, 2010 @12:12PM (#31073080) Homepage Journal

        As someone doing a PhD in a climate related area, I can see both sides of the issue. The code I work with is freely and openly available. However, 99.9% or more of the people in the world wouldn't be able to do a damn thing with it. I look at my classmates - we're all in the same degree program, yet probably only 5% of them would really be able to understand and do anything meaningful with the code I'm using.
         
        Why? We're that specialized. Here, I'm talking 5% of people studying atmospheric and oceanic sciences being able to make use of my code without taking several years to get up to speed. What's the incentive to release it? Why bother with the effort, when the audience is soooo small?
         
        Release the code, and if some dumbass decides to dig into it, you either are in the position of having to waste time answering ignorant questions, or you ignore them, giving them ammo for "teh code is BOGUS!!!!" Far easier to just keep the code in-house, and hand it out to the few qualified researchers who might be interested. Unsurprisingly, a lot of scientific code is handled this way.
         
        However, I do very much believe in completely transparent discourse. My research group has two major comparison studies of different climate models. We pulled in data from seven models from seven different universities, and analyzed the differences in CO2 predictions, among other things. The data was freely and openly given to us by these other research groups, and they happily contributed information about the inner workings of their models. This, in my book, is what it's all about. The relevant information was shared with people in a position to understand it and analyze it.
         
        It'd be a whole different story if the public wasn't filled with a bunch of ignorant whack-jobs, trying to smear scientists. When we're trying to do science, we'd rather do science than defend ourselves against hacks with a public soapbox. If you want access to the data and the code, go to a school and study the stuff. All the doors are open then. The price of admission is just having some vague idea wtf you're talking about.

        • Re: (Score:3, Informative)

          by Troed (102527)

          Your comment clearly shows you know nothing about software. I'm able to audit your source code without having a slightest clue as to what domain it's meant to be run in.

          • by apoc.famine (621563) <`apoc.famine' `at' `gmail.com'> on Tuesday February 09, 2010 @12:42PM (#31073614) Homepage Journal

            Of all the stuff that's important in scientific computing, the code is probably one of the more minor parts. The science behind the code is drastically more important. If the code is solid and the science is crap, it's useless. Likewise, the source data that's used to initialize a model is far more important than the code. If that's bogus, the entire thing is bogus.
             
            Sure, you could audit it, and find shit that's not done properly. At the same time, you wouldn't have a damn clue what it's supposed to be doing. Suppose I'm adding a floating point to an integer. Is that a problem? Does it ruin everything? Or is it just sloppy coding that doesn't make a difference in the long run? Understanding what the code is doing is required for you to do an audit which will produce any useful results.
             
            Unless you're working under the fallacy that all code must be perfect and bug free. Nobody gives a shit if you audit software and produce a list of bugs. What's important is that you be able to quantify how important those bugs are. And you can't do that without knowing what the software is supposed to be doing. When it's something a complicated as fluid dynamics or biological systems, a code audit by a CS person is pretty much worthless.

            • by Troed (102527) on Tuesday February 09, 2010 @01:00PM (#31073930) Homepage Journal

              You argument is void. A bug is a bug. Either it affects the outcome of the program run or it doesn't - and I still don't need to know anything about what it's supposed to do to verify that. You just need to re-run the program with a specified set of inputs and check the output - also known as verified against its own test suite.

              (Yes, I'm a Software Engineer by education)

              • Re: (Score:3, Interesting)

                by mathfeel (937008)

                You argument is void. A bug is a bug. Either it affects the outcome of the program run or it doesn't - and I still don't need to know anything about what it's supposed to do to verify that. You just need to re-run the program with a specified set of inputs and check the output - also known as verified against its own test suite.

                Unlike many pure-software case, scientific simulation can and MUST be checked against theory/simplified model/asymptotic behavior. The latter requires specialized understanding of the underlying science. The kind of coding bug you are talking about will usually (not always) result in damningly unphysical result, which would the immediate prompt any good student to check for the said bug. Heck, my boss usually refuse to look at my codes when I am working on it (besides advising on general structure) so that

              • Re: (Score:3, Insightful)

                by apoc.famine (621563)
                Spoken like a Software Engineer!

                A bug isn't just a bug. Either it affects the outcome of the program run or it doesn't. The issue is that if you don't know what the outcome should be, you won't be able to tell. Nobody in scientific computing just "re-run(s) the program with a specified set of inputs and check(s) the output". The input is 80% of the battle. We just ran across a paper which showed that the input can often explain 80%+ of the variance in the output of models similar to the one we use.

                So th
                • Re: (Score:3, Insightful)

                  by Troed (102527)

                  Sorry, no. You're just displaying your ignorance above. You cannot look at the output and say that just because it fits with your preconceived notions it's therefor correct. You do not know if you have problems in a farhenheit to celcius conversion, a truncation when casting between units etc (yes, examples chosen on purpose). You might get a result that's in the right ballpark. You might believe you have four significant digits when you only have three. Your homebrewn statistical package might not have bee

            • by MikeBabcock (65886) <mtb-slashdot@mikebabcock.ca> on Tuesday February 09, 2010 @01:13PM (#31074152) Homepage Journal

              Both are issues. If your code is buggy, the output may also be buggy. If the code is bug-free but the algorithms buggy, the output will also be buggy.

              The whole purpose of publishing in the scientific method is repeatability. If the software itself is just re-used without someone looking at how it works or even better, writing their own for the same purpose, you're invalidating a whole portion of the method itself.

              As a vastly simplified example, I could posit that 1 + 2 = 4. I could say I ran my numbers through a program as such:

              print f(1, 2);
              f (a, b):
              print $b + $b;

              If you re-ran my numbers yourself through MY software without validating it, you'd see that I'm right. Validating what the software does and HOW it does it is very much an important part of science, and unfortunately overlooked. While in this example anyone might pick out the error, in a complex system its quite likely most people would miss one.

              To the original argument, just because very few people would understand the software doesn't mean it doesn't need validating. Lots of peer review papers are truly understood by a very small segment of the scientific population, but they still deserve that review.

        • by TheTurtlesMoves (1442727) on Tuesday February 09, 2010 @12:30PM (#31073360)
          Your not the F***** pope. You don't get to tell people they are not worthy enough to look at your/code data. You don't like it, don't do science. But this attitude of only cooperating with a "vetoed" group of people is causing far more problems than you think you are solving by doing it. You are not as smart as you think you are.

          Want to make a claim/suggestion that has very real economic and political ramifications for everyone, you provide the data/models for everyone. Otherwise, have a nice hot cup of shut the frak up.
          • Re: (Score:3, Insightful)

            by MikeURL (890801)
            You're modded flamebait but I agree 100% with the sentiment (even if you did present it in a somewhat inflammatory way).

            Climate scientists are engaged in a LOT of "just trust us" which is fine if they are just doing their models for fun. These models are being used to propose enormous changes in the way the world economy is organized. I don't think we want to switch from one secretive organization, the Fed, to another. The Fed at least can make the argument that it cannot release its models due to peo
          • Re: (Score:3, Insightful)

            by pod (1103)

            Exactly, although I echo the sentiment the presentation could have been better.

            Everywhere we turn there are people who think they are smart telling us what to do and what to think, because they know what is best for us. They're the experts with years of training, and we know nothing. Do not question the high priests, do not pay attention to the man behind the curtain.

            This is just following the general trend of late, culminating in "this time, it's different, trust us". We think we're smarter, we're better,

            • Re: (Score:3, Insightful)

              by Thiez (1281866)

              > Then it was the ozone hole that would fry anyone not wearing SPF1000 sunblock. Where did that go?

              We stopped using the CFCs that were identified as a major contributor to the problem and it appears that is working. Oh sorry, I don't think that supports your argument.

        • by bmajik (96670) <matt@mattevans.org> on Tuesday February 09, 2010 @01:18PM (#31074234) Homepage Journal

          However, 99.9% or more of the people in the world wouldn't be able to do a damn thing with it. I look at my classmates - we're all in the same degree program, yet probably only 5% of them would really be able to understand and do anything meaningful with the code I'm using.

          I think the world is very lucky that Linus Torvalds wasn't as narrow-sighted and conceited as you are.

          Why? We're that specialized. Here, I'm talking 5% of people studying atmospheric and oceanic sciences being able to make use of my code without taking several years to get up to speed. What's the incentive to release it? Why bother with the effort, when the audience is soooo small?

          Release the code, and if some dumbass decides to dig into it, you either are in the position of having to waste time answering ignorant questions, or you ignore them, giving them ammo for "teh code is BOGUS!!!!" Far easier to just keep the code in-house, and hand it out to the few qualified researchers who might be interested. Unsurprisingly, a lot of scientific code is handled this way.

          However, I do very much believe in completely transparent discourse. My research group has two major comparison studies of different climate models. We pulled in data from seven models from seven different universities, and analyzed the differences in CO2 predictions, among other things. The data was freely and openly given to us by these other research groups, and they happily contributed information about the inner workings of their models. This, in my book, is what it's all about. The relevant information was shared with people in a position to understand it and analyze it.

          It'd be a whole different story if the public wasn't filled with a bunch of ignorant whack-jobs, trying to smear scientists. When we're trying to do science, we'd rather do science than defend ourselves against hacks with a public soapbox. If you want access to the data and the code, go to a school and study the stuff. All the doors are open then. The price of admission is just having some vague idea wtf you're talking about.

          Have you heard of "ivory tower"? You're it.

          Your position basically boils down to this: "unless you read all the same things I read, talked to all the same people I talked to, went to all the same schools I did... you're not qualified to talk to me".

          That is _the_ definition of monocultural isolationism.. i.e. the Ivory Tower of Academia problem.

          Here's the problem: if your requirement is that anyone you consider a "peer" must have had all of the same inputs and conditionings that you had... what basis do you have for allowing them to come out of the other side of that machine with a non-tainted point of view?

          As a specific counterpoint to your way of thinking:

          My dad is an actuary.. one of the best in the world. He regularly meets with the top handful of insurance regulators in foreign governments. He manages the risk of _billions_ of dollars. The maths involved in actuarial science embarass nearly any other branch of applied mathematics. I have an undergraduate math degree and I could only understand his problem domain in the crudest, rough-bounding box sort of fashion. Furthermore, he's been a programmer since the System/360 days.

          Yet his code, while there is a lot of it, is something I am definitely able to help him with. We talk about software engineering and specific technical problems he is having on a frequent basis.

          You don't need to be a problem domain expert in order to demonstrate value when auditing software.

          Furthermore, as a professional software tester, I happen to find that occasionally, not over-familiarizing myself with the design docs and implementation details too early allow me to ask better "reset" questions when doing design and code reviews. "Why are you doing this?" And as the developer talks me through it, they understand how shaky their assumptions are. If I had been "travelling" with them in lock step

        • It seems to me that what's important is the theory being modeled, the algorithms used to model it, and of course the data. The code itself isn't really useful for replicating an experiment, because it's just a particular - and possibly faulty - implementation of the model and as such is akin to the particular lab bench practices that might implement a standard protocol. Replicating a modeling experiment should involve using - and writing, if necessary - code that implements the model the original investigat

          • Re: (Score:3, Interesting)

            by Troed (102527)

            Wrong. You're taking two separate issues and try to claim that since there are two one is irrelevant. Of course it's not - both are. However, verifying the model DOES take more domain knowledge than verifying the implementation. We're currently discussing verifying the implementation, which is still important.

        • I hate to break it to you but all programming is highly specialized. Climatology is in no way special in this regard.

          Neither do programmers have to understand the abstract model of the program to write it or evaluate it. The vast majority of professional programmers do not understand the abstract model of the code they create. You do not have to be a high-level accountant to write corporate accounting software and you don't have to be a doctor to write medical software. Most programmers spend most of their

      • Re:Seems reasonable (Score:5, Informative)

        by Sir_Sri (199544) on Tuesday February 09, 2010 @12:14PM (#31073116)

        And it's not like the people writing this code are, or were trained in computer science, assuming computer science even existed when they were doing the work.

        Having done an undergrad in theoretical physics, but being in a PhD in comp sci now I will say this: The assumption in physics when I graduated in 2002 was that by second year you knew how to write code, whether they've taught you or not. Even more recently it has still been an assumption that you'll know how to write code, but they try and give you a bare minimum of training. And of course it's usually other physical scientists who do the teaching, not computer scientists, so bad information (or out of date information or the like) is propagated along. That completely misses the advanced topics in computer science which cover a lot more of the software engineering sort of problems. Try explaining to a physicist how a 32 or 64 bit float can't exactly replicate all of the numbers they think it can and watch half of them have their eyes gloss over for half an hour. And then the problem is what do you do about it?

        Then you get into a lab (uni lab). Half the software used will have been written in F77 when it was still pretty new, and someone may have hacked some modifications in here and there over the years. Some of these programs last for years, span multiple careers and so on. They aren't small investments but have had grubby little grad student paws on them for a long time, in addition to incompetent professor hands.

        None of scientific computing is done particularly well, they expect people with no training in software development to do the work, assuming it was done when software development existed, and there isn't the funding to pay people who might do it properly.

        On top of all that it's not like you want to release your code to the public right away anyway. As a scientist you're in competition with groups around the world to publish first. You describe in your paper the science you think you implemented, someone else who wants to verify your results gets to write a new chunk of code which they think is the same science and you compare. Giving out a scientists code for inspection means someone else will have a working software platform to publish papers based on your work, and that's not so good for you. For all the talk of research for the public good, ultimately your own good, of continuing to publish (to get paid) trumps a public need. That's a systematic problem, and when you're competing with a research group in brazil, and you're in canada their rules are different than yours, and so you keep things close to the chest.

      • Re: (Score:3, Interesting)

        Having worked in academia I can attest to the very poor code quality, at least in my area. The reason is very simple, the modern research scientists is often a jack-of-all-trades. A combination IT professional, programmer, teacher, manager, recruiter, bureaucrat, hardware technician, engineer, statistician, author and mathematician as well as being an expert in the science of their own field. Any one of these disciplines would take years of experience to develop professional skills at. Most scientists simpl

    • Only if... (Score:3, Insightful)

      by captainpanic (1173915)

      Only if the real programmers out there promise to be nice to us scientists.

      Most scientists will know a lot about, well, science... but not much about writing code or optimizing code.

      Like my scripts. All correct, all working... lots of formulas... but probably a horribly inefficient way to calculate what I need. :-)

      the last thing I need is someone to come to me and tell me that the outcome is correct but that my code sucks.
      (And no, I am not interested in a course to learn coding - unless it's a 1-week crash

  • great! (Score:4, Insightful)

    by StripedCow (776465) on Tuesday February 09, 2010 @10:47AM (#31071920)

    Great!

    I'm getting somewhat tired from reading articles, where there is little or no information regarding program accuracy, total running time, memory used, etc.
    And in some cases, i'm actually questioning whether the proposed algorithms actually work in practical situations...

  • Stuff like Sweave (Score:4, Interesting)

    by langelgjm (860756) on Tuesday February 09, 2010 @10:49AM (#31071944) Journal

    Much quantitative academic and scientific work could benefit from the use of tools like Sweave, [wikipedia.org] which allows you to embed the code used to produce statistical analyses within your LaTeX document. This makes your research easier to reproduce, both for yourself (when you've forgotten what you've done six months from now) and others.

    What other kinds of tools like this are /.ers familiar with?

  • by bramp (830799) on Tuesday February 09, 2010 @10:50AM (#31071946)
    I've always been a big fan of releasing my academic work under a BSD licence. My work is funded by the taxpayers, so I think the taxpayers should be able to do what they like with my software. So I fully agree that all software should be released. It is not always enough to just publish a paper, but you should release your code so others can fully review the accuracy of your work.
  • About time! (Score:5, Informative)

    by sackvillian (1476885) on Tuesday February 09, 2010 @10:50AM (#31071948)

    The scientific community needs to get as far as we can from the policies of companies like Gaussian Inc., who will ban [bannedbygaussian.org] you and your institution for simply publishing any sort of comparative statistics on calculation time, accuracy, etc. from their computational chemistry software.

    I can't imagine what they'd do to you if you started sorting through their code...

    • Re: (Score:3, Informative)

      One thing to point out is that there are now plenty of open source codes available for doing similar things as gaussian so it can be avoided now with relative ease. Two that come to mind are the the Department of Energy funded codes: nwchem [pnl.gov] for ab initio work and lammps [sandia.gov] for molecular dynamics. I use the NIH funded code vmd [uiuc.edu] for visualization. The best part about those codes is that they're designed to be compiled using gcc and run on linux so you can get off the non-open source software train all together
  • by BoRegardless (721219) on Tuesday February 09, 2010 @10:51AM (#31071980)

    One significant figure?

    • Re: (Score:3, Interesting)

      That actually surprised me, too. Loss of precision is nothing new. When you use floats to do the arithmetic, you lose precision in each operation, and particularly when you multiply two numbers with different scales (exponents). The thing that surprised me was not that a calculation could lose precision. It was the assertion that any precision would remain, at all.

      Numeric code can be written using algorithms that minimize loss of precision, or that are able to quantify the amount of precision that is l

    • Re: (Score:3, Informative)

      by khayman80 (824400)
  • MaDnEsS ! (Score:4, Funny)

    by Airdorn (1094879) on Tuesday February 09, 2010 @10:53AM (#31071998)
    What? Scientists showing their work for peer-review? It's MADNESS I tell you. MADNESS !
  • I'd like to see actual examples of the code failures mentioned in the T experiments paper.

    Or at least Figure 9.

    • Re: (Score:3, Insightful)

      by Cyberax (705495)

      His colleague was _sued_ (by a crank) based on released FOIA data. It might explain a certain reluctance to disclose data to known trolls.

      • by Idiot with a gun (1081749) on Tuesday February 09, 2010 @11:10AM (#31072190)
        Irrelevant. If you can't take some trolls, maybe you shouldn't be in such a controversial topic. The accuracy of your data is far more significant than your petty emotions, especially if your data will be affecting trillions of dollars worldwide.
        • Precisely (Score:4, Insightful)

          by Sycraft-fu (314770) on Tuesday February 09, 2010 @01:02PM (#31073968)

          The more important the research, the larger the item under study, the more rigorous the investigation should be, the more carefully the data should be checked. This isn't just for public policy reasons but for general scientific understanding reasons. If your theory is one that would change the way we understand particle physics, well then it needs to be very thoroughly verified before we say "Yes, indeed this is how particles probably work, we now need to reevaluate tons of other theories."

          So something like this, both because of the public policy/economic implications and the general understanding of our climate, should be subject to extreme scrutiny. Now please note that doesn't mean saying "Look this one thing is wrong so it all goes away and you can't ever propose a similar theory again!" However it means carefully examining all the data, all the assumptions, all the models and finding all the problem with them. It means verifying everything multiple times, looking at any errors any deviations and figuring out why they are there and if they impact the result and so on.

          Really, that is how science should be done period. The idea of strong empiricism is more or less trying to prove your theory wrong over and over again, and through that process becoming convinced it is the correct one. You look at your data and say "Well ok, maybe THIS could explain it instead," and test that. Or you say "Well my theory predicts if X happens Y will happen, so let's try X and if Y doesn't happen, it's wrong." You show your theory is bulletproof not by making sure it is never shot at, but by shooting at it yourself over and over and showing that nothing damages it.

          However that this process is done right becomes more important the bigger the issue is. If you aren't right on a theory that relates to migratory habits of a sub species of bird in a single state, ok well that probably doesn't have a whole lot of wider implications for scientific understanding, or for the way the world is run. However if you are wrong on your theory of how the climate works, well that has a much wider impact.

          Scrutiny is critical to science, it is why science works. Science is all about rejecting the ideas that because someone in authority said it, it must be true, or that a single rigged demonstration is enough to hang your hat on. It is all about testing things carefully and figuring out what works, and what doesn't.

    • Re: (Score:3, Insightful)

      by jgtg32a (1173373)

      Shit like this is why I'm hesitant about going along with Climate Change. I'm in no way qualified to review scientific data, but I can tell when someone is shady, and I don't trust shady people.

    • by acoustix (123925) on Tuesday February 09, 2010 @12:27PM (#31073330) Homepage

      "Why should I make the data available to you, when your aim is to find something wrong with it?"

      That used to be what Science was. Of course, that was when truth was the goal.

  • by stewbacca (1033764) on Tuesday February 09, 2010 @11:00AM (#31072056)

    My bet is there is a simple explanation...namely that scientists outside of computer science are too busy in their respective fields to know anything about code, or even care. The egocentric Slashdot-worldview strikes at the heart of logic yet again.

    • Re: (Score:3, Interesting)

      by quadelirus (694946)
      Unfortunately computer science is pretty closed off as well. Too few projects end up in freely available open code. It hinders advancement (because large departments and research groups can protect their field of study from competition by having a large enough body of code that nobody else can spend the 1-2 years required to catch up) and it hinders verifiability (because they make claims on papers about speed/accuracy/whatever and we basically have to stake it on their word and reputation and whether it SE
    • by nten (709128)

      I am suspect of the interface reference. Are they counting things where an enumeration got used as an int, or there was an implicit cast from a 32bit float to a 64bit one? From a recent TV show "A difference that makes no difference is no difference." Stepping back a bit there will be howls from OO/Functional/FSM zealots that look at a program and declare its inferior architecture, lack of maintainability etc. indicate its results are wrong. These are programs written to be run once to turn one set of d

    • by AlXtreme (223728) on Tuesday February 09, 2010 @11:41AM (#31072594) Homepage Journal

      that scientists outside of computer science are too busy in their respective fields to know anything about code, or even care.

      If their code results in predictions that affect millions of lives and trillions of dollars, perhaps they should learn to care.

      What I've personally seen of scientists is a frantic determination to publish papers anywhere and everywhere, no matter how well-founded the results in those papers are. The IPCC-gate is merely a symptom of a deeper problem within scientific research.

      If scientists are too busy because of publication quota's and funding issues to focus on delivering proper scientific research, maybe we should question our current means of supporting scientific research. Currently we've got quantity, but very little quality.

      • Re: (Score:3, Interesting)

        by c_sd_m (995261)

        What I've personally seen of scientists is a frantic determination to publish papers anywhere and everywhere, no matter how well-founded the results in those papers are. The IPCC-gate is merely a symptom of a deeper problem within scientific research.

        They're trained for years on a publish or perish doctrine. Either they have enough publications or they get bounced out of academia at some threshold (getting into a PhD, getting a post doc, getting a teaching position, getting tenure, ...). Under that pressure you end up with just the people who churn out lots of papers making it into positions of power. In some fields you're also expected to pull in significant research funding and there are few opportunities to do some without corporate partnerships. So

  • by stokessd (89903) on Tuesday February 09, 2010 @11:04AM (#31072120) Homepage

    I got my PhD in fluid mechanics funded by NASA, and as such my findings are easily publishable and shared with others. My analysis code (such as it was) was and is available for those would would like to use it. More importantly my experimental data is available as well.

    This represents the classical pure research side of research where we all get together and talk about our findings and there really aren't any secrets. But even with this open example, there are still secrets when it comes to ideas for future funding. You only tip your cards when it comes to things you've already done, not future plans.

    But more importantly, there are whole areas of research that are very closed off. Pharma is a good example. Sure there are lots of peer reviewed articles published and methods discussed, but you'll never really get into their shorts like this guy wants. There's a lot that goes on behind that curtain. And even if you are a grad student with high ideals and a desire to share all your findings, you may find that the rules of your funding prevent you from sharing.

    Sheldon

    • by PhilipPeake (711883) on Tuesday February 09, 2010 @11:54AM (#31072804)

      ... and this is the problem. The move from direct government grants to research to "industry partnerships".

      Well, (IMHO) if industry wants to make use of the resources of academic institutions, they need to understand the price: all the work becomes public property. I would go one step further, and say that one penny of public money in a project means it all becomes publicly available.

      Those that want to keep their toys to themselves are free to do so, but not with public money.

  • That's all wrong (Score:3, Interesting)

    by Gadget_Guy (627405) * on Tuesday February 09, 2010 @11:06AM (#31072140)

    The scientific process is to invalidate a study if the results cannot be reproduced by anyone else. That way you can eliminate all potential problems like coding errors, invalid assumptions, faulty equipment, mistakes in procedures, and 100 of the other things that can produce dodgy results.

    It can be misleading to search through the code for mistakes when you don't know which code was eventually used in the final results (or in which order). I have accumulated quite a lot of snipits of code that I used to fix a particular need at the time. I am sure that many of these hacks were ultimately unused because I decided to go down a different path in data processing. Or the temporary tables used during processing is no longer around (or in a changed format since the code was written). There is also the problem of some data processing being done by commercial products.

    It's just too hard. The best solution is to let science work the way it has found to be the best. Sure you will get some bad studies, but these will eventually be fixed over time. The system does work, whether vested interests like it or not.

  • Many scientists get their code from companies or individuals that license it to them, much like most other software. They're not in the position to release the code for many experiments...!

  • by Anonymous Coward

    As it is written, the editorial is saying that if there is any error at all in a scientific computer program, the science is usually invalid. What a lot of bull hunky! If this were true, then scientific computing would be impossible, especially with regards to programs that run on Windows.

    Scientists have been doing great science with software for decades. The editorial is full of it.

    Not that it would be bad for scientists to make their software open source. And not that it would be bad for scientists to ben

  • I concur (Score:5, Interesting)

    by dargaud (518470) <[slashdot2] [at] [gdargaud.net]> on Tuesday February 09, 2010 @11:16AM (#31072260) Homepage
    As a software engineer who has spent 20 years coding in research labs, I can say with certainty that the code written by many, if not most, scientists is utter garbage. As an example, a colleague of mine was approached recently to debug a piece of code: "Oh, it's going to be easy, it was written by one of our postdocs on his last day here...". 600 lines of code in the main, no functions, no comments. He's been at it for 2 months.

    I'm perfectly OK with the fact that their job is science and not coding, but would they go to the satellite assembly guys and start gluing parts at random ?

    • Re: (Score:3, Insightful)

      by Rising Ape (1620461)

      > 600 lines of code in the main, no functions, no comments

      Does that make it function incorrectly?

      Looking pretty and being correct are orthogonal issues. Code can be well-structured but wrong, after all.

  • Observations... (Score:5, Informative)

    by kakapo (88299) on Tuesday February 09, 2010 @11:17AM (#31072268)

    As it happens, my students and I are about to release a fairly specialized code - we discussed license terms, and eventually settled on the BSD (and explicitly avoided the GPL), which requires "citation" but otherwise leaves anyone free to use it.

    That said, writing a scientific code can involve a good deal of work, but the "payoff" usually comes in the form of results and conclusions, rather than the code itself. In those circumstances, there is a sound argument for delaying any code release until you have published the results you hoped to obtain when you initiated the project, even if these form a sequence of papers (rather than insisting on code release with the first published results)

    Thirdly, in many cases scientists will share code with colleagues when asked politely, even if they are not in the public domain.

    Fourthly, I fairly regularly spot minor errors in numerical calculations performed by other groups (either because I do have access to the source, or because I can't reproduce their results) -- in almost all cases these do not have an impact on their conclusions, so while the "error count" can be fairly high, the number of "wrong" results coming from bad code is overestimated by this accounting.

  • Back in college, I did some computer vision research. Most people provided open source code for anyone to use. However, aside from the code being of questionable quality, it was mostly written in Matlab with C handlers for optimization.

    In order to properly test all of the software out there you would need:

    1. A license for every version of Matlab.
    2. Windows
    3. Linux
    4. Octave

    I had our school's Matlab, but none of the code we found was written on that version. Some was Linux, some Windows, (the machine I had w

  • all (Score:2, Insightful)

    by rossdee (243626)

    So if scientists use MS Excel for part of their data analysis, MS should release the source code of Excel to prove that there's no bugs in it (that may favour one conclusion over another)
    Soumds fair to me.

    And if MS doesnt comply then all scientists have to switch to OO.org ?

  • by DoofusOfDeath (636671) on Tuesday February 09, 2010 @11:22AM (#31072344)

    I'm working on my dissertation proposal, and I'd like to be able to re-run the benchmarks that are shown in some of the papers I'm referencing. But must of the source code for those papers has disappeared into the aether. Without their code, it's impossible for me to rerun the old benchmark programs on modern computers so that I and others can determine whether or not my research has uncovered a better way of doing things. This is very far from the idealized notion of the scientific method, and significantly calls into question many of the things that we think we know based on published research.

  • Not a good idea (Score:5, Insightful)

    by petes_PoV (912422) on Tuesday February 09, 2010 @11:23AM (#31072364)
    The point about reproducible experiments is not to provide your peers with the exact same equipment you used - then they'd get (probably / hopefully) the exact same results. The idea is to provide them with enough information so that they can design their own experiements to [b]measure the same things[/b] and then to analyze their results to confirm or disprove your conclusions.

    If all scientists run their results through the same analytical software, using the same code as the first researcher, they are not providing confirmation, they are merely cloning the results. That doesn't give the original results either the confidence that they've been independently validated, or that they have been refuted.

    What you end up with is no-one having any confidence in the results - as they have only ever been produced in one way and arguments thatt descend into a slanging match between individuals and groups of vested interests who try to "prove" that the same results show they are right and everyone else is wrong.

  • Not that simple (Score:4, Interesting)

    by khayman80 (824400) on Tuesday February 09, 2010 @11:47AM (#31072694) Homepage Journal

    I'm finishing a program that inverts GRACE data to reveal fluctuations in gravity such as those caused by melting glaciers. This program will eventually be released as open source software under the GPLv3. It's largely built on open source libraries like the GNU Scientific Library, but snippets of proprietary code from JPL found their way into the program years ago, and I'm currently trying to untangle them. The program can't be made open source until I succeed because of an NDA that I had to sign in order to work at JPL.

    It's impossible to say how long it will take to banish the proprietary code. While working on this project, my research is at a standstill. There's very little academic incentive to waste time on this idealistic goal when I could be increasing my publication count.

    Annoyingly, the data itself doesn't belong to me. Again, I had to sign an NDA to receive it. So I can't release the data. This situation is common to scientists in many different fields.

    Incidentally, Harry's README file is typical of my experiences with scientific software. Fragile, unportable, uncommented spaghetti code is common because scientists aren't professional programmers. Of course, this doesn't invalidate the results of that code because it's tested primarily through independent verification, not unit tests. Scientists describe their algorithms in peer-reviewed papers, which are then re-implemented (often from scratch) by other scientists. Open source code practices would certainly improve science, but he's wrong to imply that a single bug could have a significant impact on our understanding of the greenhouse effect.

  • by Wardish (699865) on Tuesday February 09, 2010 @11:55AM (#31072826) Journal

    As part of publication and peer review all data and providence of the data as well as any additional formula's, algorithms, and the exact code that was used to process the data should be placed online in a neutral holding area.

    Neutral area needs to be independent and needs to show any updates and changes, preserving the original content in the process.

    If your data and code (readable and compilable by other researchers) isn't available then peer review and reproduction of results is foolish. If you can't look in the black box then you can't trust it.

  • It's an old story (Score:5, Informative)

    by jc42 (318812) on Tuesday February 09, 2010 @01:14PM (#31074164) Homepage Journal

    This is hugely worrying when you realise that just one error -- just one -- will usually invalidate a computer program.

    Back in the 1970s, a bunch of CompSci guys at the university where I was a grad student did a software study with interesting results. Much of the research computing was done on the university's mainframe, and the dominant language of course was Fortran. They instrumented the Fortran compiler so that for a couple of months, it collected data on numeric overflows, including which overflows were or weren't detected by the code. They published the results: slightly over half the Fortran jobs had undetected overflows that affected their output.

    The response to this was interesting. The CS folks, as you might expect, were appalled. But among the scientific researchers, the general response was that enabling overflow checking slowed down the code measurably, so it shouldn't be done. I personally knew a lot of researchers (as one of the managers of an inter-departmental microcomputer lab that was independent of the central mainframe computer center). I asked a lot of them about this, and I was appalled to find that almost every one of them agreed that overflow checking should be turned off if it slowed down the code. The mainframe's managers reported that almost all Fortran compiles had overflow checking turned off. Pointing out that this meant that fully half of the computed results in their published papers were wrong (if they used the mainframe) didn't have any effect.

    Our small cabal that ran the microprocessor lab reacted to this by silently enabling all error checking in our Fortran compiler. We even checked with the vendor to make sure that we'd set it up so that a user couldn't disable the checking. We didn't announce that we had done this; we just did it on our own authority. It was also done in a couple of other similar department-level labs that had their own computers (which was rare at the time). But the major research computer on campus was the central mainframe, and the folks running it weren't interested in dealing with the problem.

    It taught us a lot about how such things are done. And it gave us a healthy level of skepticism about published research data. It was a good lesson on why we have an ongoing need to duplicate research results independently before believing them.

    It might be interesting to read about studies similar to this done more recently. I haven't seen any, but maybe they're out there.

  • by azgard (461476) on Tuesday February 09, 2010 @01:17PM (#31074216)

    While I am fan of open source and this idea in general, for climatology, this is a non-issue. Look there: http://www.realclimate.org/index.php/data-sources/ [realclimate.org]

    It's more code out there than one amateur can eat for life. And you know what? From the experience of people who wrote these programs, there isn't actually much people looking at it. I doubt that any scientific code will get many eyeballs. This is more a PR exercise.

"Let every man teach his son, teach his daughter, that labor is honorable." -- Robert G. Ingersoll

Working...