Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Government Software Science

Ask Slashdot: How To Encourage Better Research Software? 104

An anonymous reader writes "There is a huge amount of largely overlapping but often incompatible medical imaging research software — funded by the US taxpayer (i.e. NITRC or I Do Imaging). I imagine the situation may be similar in other fields, but it is pronounced here because of the glut of NIH funding. One reason is historical: most of the well-funded, big, software-producing labs/centers have been running for 20 or more years, since long before the advent of git, hg, and related sites promoting efficient code review and exchange; so they have established codebases. Another reason is probably territorialism and politics. As a taxpayer, this situation seems wasteful. It's great that the software is being released at all, but the duplication of effort means quality is much lower than it could be given the large number of people involved (easily in the thousands, just counting a few developer mailing list subscriptions). No one seems to ask: why are we funding X different packages that do 80% of the same things, but none of them well?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: How To Encourage Better Research Software?

Comments Filter:
  • How To Encourage Better Research Software?

    Use serious grant money to pay for it - even if it's paying students at a university.

    • by guruevi ( 827432 )

      They already do, don't worry. The main problem is partly in the article - no code reviews only recently have some of them started using version control systems. The other problem is that you have PhD's in whatever scientific field writing software with minimal knowledge of Computer Science. I work in the field with grants from NIH but the software quality is simply awful because no-one has the slightest clue about good coding practices.

  • by gilgongo ( 57446 ) on Friday April 29, 2011 @03:21PM (#35978290) Homepage Journal

    "No one seems to ask: why are we funding X different packages that do 80% of the same things, but none of them well?"

    When I think about this, I'd rather have that than one single package, if only for the reason that without competition, I'd not be able to know if it was doing anything well or not.

    Pragmatism here says plurality is probably better than some kind of Stalinist central control.

    • by gilgongo ( 57446 )

      (oh dear replying to my own post...) .. of course, if you were talking open source, then that would be another matter :-)

    • Re:Pragmatism? (Score:5, Insightful)

      by goombah99 ( 560566 ) on Friday April 29, 2011 @03:37PM (#35978486)

      The original article is clueless about the difference between research products and production software. In research there is no a priori omniscience about what is best. What you see at the end is the few survivors of an evolutionary competition of zillions of efforts. You don't see the three planned outcomes that we had known could have been written from a well thought out requirements document.

      There is a decades old saying that scientists develop the next generation of algorithms using last years computers . COmputer scientists write last years algorithm on next years computer. It is still true.

      • by Anonymous Coward

        I work in this field as well. Displaying 2d cross sections and rotatable 3d views with surface meshes has been a solved problem for a long time. There's no original research there. Yet there are in the neighborhood of a dozen major- and many more minor - viewers to do just this and only this, written with lab time and research money.

    • I track a lot of scientific software on Freshmeat. You'd be amazed at the redundancy. Medical stuff isn't as bad as some areas.

      • by xtracto ( 837672 )

        I agree with this. Look at the amount of frameworks to do Agent Based Modelling and Simulation and you will see there are at least 5 that are the same thing (Repast, Mason, NetLogo, StarLogo, Swarm) and a lot more that are very similar.

  • by robbyjo ( 315601 ) on Friday April 29, 2011 @03:21PM (#35978292) Homepage

    Not only that most researchers are not proficient in programming language, they shape their codes more like prototypes so that they can modify the codes easily as the science progress. Conventional programmers will be frustrated with this approach since they want every single spec set in stone, which will never happen in research setting since research progresses very rapidly and specs can change dramatically in most cases. If you can set the spec in stone, it is usually a sign that the field has matured and is getting transitioned to engineering-type problems. Once the transition happens, it's no longer research, it's engineering. Then you can "make the code better".

    • by sockman ( 133264 )

      Didn't you just describe why agile came about? Because we, as software professionals, realize that specifications are not set in stone and the system should be easy to adapt and modify for future requirements.

      • by robbyjo ( 315601 )

        Does "agile" software development allow scrapping 100% of the code and radically change the spec (and thereby everything else) every about 6 months just because of new scientific publication? It may sound extreme, but this often happen in research. If we take time to "structure" our code, before we know it, we have to redo it all over again. We do use libraries like GSL, BLAS, ATLAS, etc. to make our lives easier. These won't change, but whatever we build on top of these often get scrapped at regular basis.

        • Does "agile" software development allow scrapping 100% of the code and radically change the spec (and thereby everything else) every about 6 months just because of new scientific publication?

          Yes! Every iteration (month) you can throw the whole lot away and start again. You won't though, because there are always certain building blocks you can re-use.

          If we take time to "structure" our code, before we know it, we have to redo it all over again. ... So, we really don't have incentives to "beautify" the code.

          I see these arguments in all kinds of contexts. All it is excuses for poor work. The funny thing is that doing things 'right' the first time generally ends up taking less time overall. I wonder how much time you waste fixing broken code, or mistaken logic? Getting two hacks to work together? Perhaps if you stopped cutting corners constantly, you wou

          • Yes! Every iteration (month) you can throw the whole lot away and start again. You won't though, because there are always certain building blocks you can re-use.

            No! Agile is about iteration and isn't really suited to stringing a bunch of prototypes together. I can drive a nail with my fist, but that isn't a good idea.

            • Agile is just a big catch all word that has lost all meaning. Let's say "not waterfall". Whatever you thought I meant is irrelevant, but the fact remains that "rapid prototyping" is all these guys are doing. This is not a unique situation. I used the term 'agile' to indicate that there are a range of processes and development techniques, that have come about to help improve speed and quality in these situations (and they happen to fall under the umbrella of 'agile'). If your work involves regular creation o
          • by Anonymous Coward

            I call BS --- I work in a medical imaging lab. In the past two years we've had three CS "professional programmers" added to the staff. They all came in with "methodologies that will save the day" spouting the crap the parent poster is spouting .... they are still around but let's say they are much more pragmatic. And hmmm after about 6 months they all agreed that it is indeed a unique situation.

            • Every work-based situation is unique. I obviously shouldn't have used the term 'agile' as it has become a polluted by XP evangelists and so on in the past. Pragmatism would suggest that you look at what you are doing and work on improving the processes to allow you more structured control of your 'agility'. Normally in the software world we are trying to do the reverse, go from a rigid inflexible system to one that is more 'flexible' (and again, this is achieved only by looking at all processes and improvin
        • Does "agile" software development allow scrapping 100% of the code and radically change the spec (and thereby everything else) every about 6 months just because of new scientific publication?

          Yes it does. That is exactly the point about it. A good team will always be able to craft its libraries/frameworks for this project in a way that even for full rewriter they still have useful code left to be reused.
          angel'o'sphere

      • by hawkfish ( 8978 )

        Didn't you just describe why agile came about? Because we, as software professionals, realize that specifications are not set in stone and the system should be easy to adapt and modify for future requirements.

        There is a big difference between "set in stone" and "unconstrained". Put another way, "XP is aimed at customers who don't know what they want. [softwarereality.com]"

    • The article description sounds like a perfect description of the state of all the linux distro's, all the linux desktop managers, and all the linux word processors. That is, there is a proliferation of not quite compatible products that do 80% of the job well.

      So I guess the article is saying we should take this shining example from computer engineering and use it to refor how scientific packages are developed.

      Wow. glass houses much?

    • I completely agree with the not being proficient in programming languages. And they should be required to take some security classes if they're going to be writing any significant code that runs as a CGI or acts as a service. For some reason they don't like it when I refuse to run their shell script CGIs. ... but I'd argue that they don't 'shape [it] ... so that they can modify the codes easily' ... Unless 'easily' means an attempt at a find & replace in 40+ places when they should've used a function

    • Re: (Score:3, Informative)

      by Anonymous Coward

      I do medical imaging as my day job. The parent understates the "spec" problem -- its just as much a testing problem. The typical spec I work against is "create a tool that distinguishes this disease state from some other disease state and from healthy normals with optimal power". Optimal power is, of course, only defined by the results you get or against other software (probably that measures different facets of disease). Moreover, the spec gets driven by log10 increases in image numbers --- that i

    • specs can change dramatically in most cases

      Moreover, the very act of scientific progress is questioning and experimenting with the assumptions in the spec.

    • We've developed a lot of in-house code here at SLAC. Often we have had better success with the prototype code developed by scientists than with the rigorously written by software engineers. This is because at a research lab the requirements are constantly changing (if we knew what we were doing it wouldn't be research), and the design cycle for specify / write / test / debug / deploy is to slow. Having the people directly involved with the experiment writing the code in real time gets better results faster

      • I think the research environment is fundamentally different from a commercial environment. In many software projects the requirements are continually changing. This is not a result of poor planning by the people requesting the software, but rather the desire to take best advantage of new scientific information as it becomes available. The resulting informal code development is very efficient for the project, but produces code that is difficult to transport to other projects.

        Your situation is not different to many commercial environments. In fact, this is one of the largest problems in software development (notice I use the word development, not engineering). There are ways to write quality, flexible, extendible, maintainable programs in these environments, but it is much harder. I'm not talking out of my arse here, I've been in this game for many years now, and have seen approaches that work, and ones that fail. If the resultant program is truly "use once, then throw away", th

    • How this was modded up in a place like /. astoundes me. Let me address your points one by one.

      Not only that most researchers are not proficient in programming language, they shape their codes more like prototypes so that they can modify the codes easily as the science progress.

      News flash! Everyone who is writing software that is "new" is "prototyping". This is not a new problem, this is why we have design patterns and TDD.

      Conventional programmers will be frustrated with this approach since they want every single spec set in stone

      I take it you know very few programmers, maybe at IBM? Managers want "specs", many of us developers want to work on something interesting. If there were all these lovely specs set out in stone, it wouldn't be much of a challenge now would it?

      ...will never happen in research setting since research progresses very rapidly and specs can change dramatically in most cases

      Hey! That sounds just li

      • News flash! Everyone who is writing software that is "new" is "prototyping". This is not a new problem, this is why we have design patterns and TDD.

        Well, I only prototype if I need/want to. Even in an "agil" environment in every iteration we deliver a "customer ready" end product. Not a prototype.

        Mistakenly thinking that software development was engineering is what has caused more than one company to fail.

        Software Development is Engineering, hence the word Development. If you can not apply software engineer

  • Git and Mercurial (hg) are not sites, they're programs. They have nothing to do with code review. You could say that they do promote "efficient code exchange", but so does any other VCS. Are you seriously trying to tell us that these big labs are not using version control while developing their systems?

    • by blueg3 ( 192743 )

      Are you seriously trying to tell us that these big labs are not using version control while developing their systems?

      That's a lot more common than any sane programmer would suspect.

    • by Anonymous Coward

      advent of git, hg, and related sites

      ie github, bitbucket, gitorious, etc. IMHO there is something fundamentally different about kind of sharing on newer dVCS as compared to traditional (cvs, svn) VCS. Not to mention the benefits of in-line code-review (on github and others).

  • It's probably a good thing that there's at least two different groups working on the same thing. Competition creates incentives for those within it to write better code so that it's more widely adopted and they get more funding. Why do we have Chrome and Firefox?

    This happens in private companies too. I heard a story about a private company that hired two different offshore contractors to write the same software independently of one another -- they were on a tight deadline and had actually read the Mythic

    • There is a legend that this is what happens at Intel and Microsoft. It used to be said that every odd numbered Intel was not much of an improvement. It's still true since Windows 1.0 that every other release of windows has sucked. It was perfectly predictable that Vista would tank. (No I don't hate microsoft. Even people that love microsoft can see this has become a "law".)

      In both cases the supposed explanation is that there are two difffenent teams working at the same time. The better one gets the fi

  • This problem is widespread in almost every discipline which uses any form of computation. I think the best way is for major funding sources like the NIH, NSF etc to build in to the grant terms which coding language, existing libraries be used. Or how/what/ software will be developed should be used an additional metric for deciding which proposals to accept. Proposals which are strong otherwise but do not state in clear terms how software will be built should be asked to modify their proposals to include suc
    • by Anonymous Coward

      I wouldn't say that scientists are not good software engineers. The main problem is that they aren't paid to be software engineers. To get funding, scientists have to publish in peer reviewed conferences/journals. In order to do that it is enough to get the programs into a rough shape. Spending an extra year on polishing the software is just not going to happen as this is not easily publishable in a journal.

      This sucks, but until this is changed, don't expect great software engineering out of software releas

    • This already part of NSF:

      http://www.nsf.gov/news/news_summ.jsp?cntn_id=116928 [nsf.gov]

      (Although it's called "Data Management," it also applies to software generated in the course of research.)

  • by Anonymous Coward

    As a Ph.D. candidate who writes scientific software at a large research university in the US under NIH grant funds, I can say that simply adding more developers to a scientific software project is an unrealistic solution to the problem. Having been to a conference of developers for a well-known chemistry software package (which will remain nameless) I have seen firsthand how seemingly good intentions can quickly turn into an epic battle of conquest and control of the software. Add in the huge egos and arrog

    • by Khopesh ( 112447 )

      As a Ph.D. candidate who writes scientific software at a large research university in the US under NIH grant funds, I can say that simply adding more developers to a scientific software project is an unrealistic solution to the problem. Having been to a conference of developers for a well-known chemistry software package (which will remain nameless) I have seen firsthand how seemingly good intentions can quickly turn into an epic battle of conquest and control of the software. Add in the huge egos and arrogance of these scientists (some very famous in their field), and you wind up with software that no one wants to develop due to problems that have nothing to do with funding or lack of qualified developers. This is probably one of the main reasons new scientific software is created in the first place.

      There are examples of problems in every solution. Zotero [zotero.org] is a great example of F/OSS working beautifully, exemplifying researcher collaboration to develop a research collaboration tool. With some standardization and better communication between the F/OSS community and the research industry, I think we can open the door to more of Zotero and less of your chemistry software example.

  • Software paid for by the government is supposed to be free in the public domain. However, there are two problems with the way this rule is implemented.

    A surprising number of researchers work around this restriction and keep the software proprietary (or at least secret) by contracting the software out and purchasing outside services.

    Even when the software is public domain, there is no uniform requirement to make is openly available. Often you have to write to the principle investigator and after delay an

    • Actually, no it's not.

      There's a few issues ... the first of which is called 'Dual Use' ... basically, there's software that can be used for miltary purposes, which will *never* be put into the public domain.

      Then there's stuff that might be able to be licensed, and there's a whole gaggle of lawyers who we have to get our stuff cleared through, so we can get stuff ceritifed that our work has no value, and the government isn't going to make any money off of it.

      And then there's the security concerns ... your st

    • by blueg3 ( 192743 )

      Software paid for by the government is supposed to be free in the public domain.

      Not really. "Paid for" is a very inspecific term. Software that is developed as the result of a government-funded grant at a non-government institution is not "supposed" to be public domain. (Individual grant agencies may make such stipulations, though.)

      On the other hand, software that is the work product of government employees is supposed to be public domain. Some agencies aren't particularly helpful about distributing the source (although if only one person who cares is able to obtain it, since it's publ

    • by Desler ( 1608317 )

      Software paid for by the government is supposed to be free in the public domain.

      And this has been codified in which act of Congress?

      • by mspohr ( 589790 )
        Since you're too lazy to Google it, here is a good summary of the "rules".

        https://journal.thedacs.com/stn_view.php?stn_id=56&article_id=180 [thedacs.com]

        • by Desler ( 1608317 )

          No where in there at all says that any software paid for by the government is in the public domain. In fact, the phrase "public domain" doesn't even exist in that article. Your link only specifies the circumstances when something can released as OSS which is not the same as what you claimed.

          This article summarizes when the U.S. federal government or its contractors may publicly release, as open source software (OSS), software developed with government funds.

          Clearly you didn't even read the link you posted. Now, can you provide the actual law that says that software paid for by the government has to be in the public domain?

          • by mspohr ( 589790 )
            The actual laws are referred to in the article. They are the government contracting regulations (FAR and DFAR) which are referenced in the article.
        • by Desler ( 1608317 )

          Also since I actually do work at a company who contracts for the government, almost none of the software we create the government can release as open source or public domain as the company reserves the copyright. This is also very typical for a lot of government contracting. So thus, you're entire premise is absolutely false. Not to mention your false conflation of "public domain" with "open source".

          • by mspohr ( 589790 )
            A lot of nit-picking here today so I'll have to take these objections one at a time.

            The handy reference chart that I linked to states that the default and usual contract clause (FAR 52.227-14 or DFARS 252.227-7014) is for the government to reserve copyright for works created at public expense. This has been my experience. In my original post I also stated that many government contractors get around this by "contributing" some of their own funding or IP to the contract and thus establish an exception to t

        • by Desler ( 1608317 )

          What's also funny is that your link even cites statutory law that completely contradicts you:

          It is true that 10 U.S.C. 2320(a)(2)(F) states that “a contractor or subcontractor (or a prospective contractor or subcontractor) may not be required, as a condition of being responsive to a solicitation or as a condition for the award of a contract, to sell or otherwise relinquish to the United States any rights in technical data [except in certain cases, and may not be required to ] refrain from offering to use, or from using, an item or process to which the contractor is entitled to restrict rights in data”

          • by mspohr ( 589790 )
            I know this is a bit technical lawyerese talk so you may not understand it but the bit you quoted states that the government cannot compel a private party which holds existing IP rights to relinquish these rights to the government. This is a different situation from that where the government is contracting for the creation of new IP.
  • The trouble is that you don't get grants for software development. You get them for original research, i.e. novelty. All you need to publish a paper is a hackish implementation that works once. After that's done, there's no reward for improving your code and iterating further. If you're trying to stay competitive, you move on to the next thing.

    Developing good software is for industry to do. Unlike academia, industry can get massive rewards for making a well-implemented toolkit. No academically-developed sof

    • I think you're missing two points:

      First, as an academic group, it is important to make your software usable for other groups. It brings collaborators to you - researchers who want to do something new, that your software almost supports. It's faster for them to work with you than start from scratch - more for your expertise in the field than for your coding abilities.

      Second, industrial software isn't open source, and for niche markets is often of terrible quality at a very high price. Open-source is s
  • by AdmiralXyz ( 1378985 ) on Friday April 29, 2011 @03:42PM (#35978550)
    I'm a computer scientist in the middle of getting my BA, but for research experience or in the process of taking an elective, I've spent time with grad students in other departments- mostly biology and linguistics- and the software they write. Smart people? Absolutely- they're experts in their field. But they can't write code to save their lives. I've seen things that make me want to run screaming to TheDailyWTF and the quality software engineering on display there ;)

    I don't think this is a bad thing, myself. Most of this code is single-use only, being written for a specific purpose (or a specific thesis paper), and will never be used again. Not to mention they're taking enough time to get their degrees as it is- I don't think it's reasonable to ask them to become expert software engineers as well. OP claims that taxpayer dollars are being wasted, but think how much waste there'd be if every researcher had to get a CS degree before they started in their own field, too.
    • by blueg3 ( 192743 )

      You think that's bad, you should see the software written by engineers that's used to perform many important engineering tasks!

  • by gr8_phk ( 621180 ) on Friday April 29, 2011 @03:47PM (#35978596)
    If you're not happy with what's out there, you need to roll your own. If what's out there is open source, you can pick the best of each of them and build the solid system you're looking for. With research projects, once the stated goal has been reached they are done - until a follow-up grant for further work is awarded. That seems to be what research is about - showing that things can be done or done a different way - not producing a useful software product. Once they show what and how, it's up to someone else to take that and make something great from all the pieces. Unfortunately that means sifting through all the duplicate stuff and finding the best approach and possibly reimplementing it to fit in with everything else you're doing.

    For example, you may find Kalman filters, genetic algorithms, neural networks, GPU implementations, etc. all able to solve a particular problem. For real-world software you really don't care about all that, you just want the ONE that works best in your application. Of course then there will be papers on "extensible frameworks" with "plugins" that can handle any of those implementations... Again, for real software you pick the one that works "best" for your definition of best and go with that. To make this happen, you need to get an ego-less (read non-PhD) software team to pull it all together.
  • ... wasted effort, but the allocation of the money and the people involved.
    It is not all that hard to create very good, powerful and even big applications, but it becomes a hell of a lot harder if you throw tons money and people at it.
    And yes I have worked in university physiology and they have horrendous software as well, but there is simply no other alternatives and very few real programmers working in the field, so no one who could fill the hole knows about it.

  • Although I admit, some of it's the 'not invented here' problem, one of the big reasons there isn't better collaboration is that most scientists don't know that someone's working on something similar.

    I deal with software that works on FITS files. The two main fields that use FITS -- medical imaging and astronomy. Do you think the two collaborate? Hell no.

    And even if you do find out about some great new tool ... it's after it's been released. Which might be two or more years after they started development,

    • Regarding FITS (Flexible Image Transport System), if this is used in significant ways in medical imaging, the astronomical FITS user community would love to know about it and collaborate. Regarding rice-compressed FITS, I (and undoubtedly my coauthors) would be beyond fascinated to learn of either medical imaging use cases or compression tools for this purpose. Alternately, any FITS-based medical imaging applications should be aware of the astronomical data compression work accessible through http://heasa [nasa.gov]
      • What's the relationship between FITS, HDF and NetCDF? I've looked into the last two, but eventually decided they were far more complicated than I needed them to be - and did the evil thing that so many of us do - invented my own simple format that is 'just enough' for my own needs :-).But FITS is new to me.
        • FITS is the ubiquitous data format in astronomy, see http://fits.gsfc.nasa.gov/ [nasa.gov] - it has idiosyncrasies from arising originally in the 1970's, but is extremely portable and forgiving of a wide range of host operating systems and development environments. The specification has also been published in the refereed astronomical literature, making it suitable for very long term (even in astronomical terms) archival storage. Hence the interest of the Vatican in using this for their manuscripts. Recent data com
  • Two things seem to track with successful scientific software:

    1. A center grant. These are understandably difficult to get, and typically require some venerable central Dumbledore (preferably with nobel), but they get around the insightful: "Science wants novelty, not quality" comment. That is, a center grant is designed to allow money and publications to flow for development without overt novelty.

    2. Keep the software modular, well-documented, and open. Publish everything about every interface, file

  • It's a simple problem, research software are either written by scientists that don't program or programmers who don't understand the science. So either you end up with a powerful and technically correct software that has an interface that is completely cryptic, confusing and generally unusable or you get a nice glossy looking software that doesn't do what it's supposed to do.

    It's really hard to find people who can do both.

  • The high cost of scientific software and its lack of accommodation to what scientists really need were the reason we came up with Sparklix electronic lab notebook [sparklix.com] and its business model.

    Apparently a lot of today's scientific software is developed by engineers who know nothing about the scientific areas they are targeting, trying to create yet another CRUD or CRM-like application with scientific flavor. What we attempted to do with Sparklix is to bring the researchers an experience which would be as close
  • Medical imaging is a special case.

    The FDA has a "in-house" exemption for things like software-based medical solutions (for example, the software that calculates the best way to deliver a radiation blast to your tumor, or the software that identifies tumors in MRI results, or whatever). In essence, to share software you've developed, you have to go through a lengthy and expensive approval process. Once you've written something, no matter how nice it is, there's a huge threshold of liability, expense, and has

  • It's kind of useless to post this on Slashdot honestly. What good is it going to do? If you have an idea for how to solve this and are a researcher then talk to the NIH. Otherwise this is just a lot of hot air.

    If the software is mature enough to be widely useful, then a company should try to commercialize it under and SBIR or an STTR or something. If it's not mature then it should stay what is is - a research prototype. Most research prototypes end up being useless in my experience and it's not worth t

  • Drop whatever you are doing right now and start writing a grant proposal. Here is what you will propose to do:

    • Create a set of guidelines to encourage reuseability of software. These will include:
      • General guidelines as to modularity, reusability, liscensing, and documentation rather than specific instructions about languages.
      • General guidelines as to revision control, and the posting of resulting software, similar to the Data Management Plans [nsf.gov] referred to by another commenter.
      • Minimum standards
  • In fact there are several well-designed user-extensible medical image processing frameworks available already. ImageJ, MIPAV, and ITK were funded by the NIH and fill the very void suggested by the OP. Many more mature medical imaging tools that serve a variety of niches are freely available, many of which include free source code.

    Frankly, I think the OP's main thesis is fundamentally wrong. Medical imaging research is about inventing or improving IP techniques and algorithms, not implementing and distrib

  • this is the same crap you get on quora where OO freaks question why Fortran is used rather than C++ for a lot of technical programming.

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...