Ask Slashdot: How To Encourage Better Research Software? 104
An anonymous reader writes "There is a huge amount of largely overlapping but often incompatible medical imaging research software — funded by the US taxpayer (i.e. NITRC or I Do Imaging). I imagine the situation may be similar in other fields, but it is pronounced here because of the glut of NIH funding. One reason is historical: most of the well-funded, big, software-producing labs/centers have been running for 20 or more years, since long before the advent of git, hg, and related sites promoting efficient code review and exchange; so they have established codebases. Another reason is probably territorialism and politics. As a taxpayer, this situation seems wasteful. It's great that the software is being released at all, but the duplication of effort means quality is much lower than it could be given the large number of people involved (easily in the thousands, just counting a few developer mailing list subscriptions). No one seems to ask: why are we funding X different packages that do 80% of the same things, but none of them well?"
Pay For It (Score:2)
How To Encourage Better Research Software?
Use serious grant money to pay for it - even if it's paying students at a university.
Re: (Score:2)
Re: (Score:2)
They already do, don't worry. The main problem is partly in the article - no code reviews only recently have some of them started using version control systems. The other problem is that you have PhD's in whatever scientific field writing software with minimal knowledge of Computer Science. I work in the field with grants from NIH but the software quality is simply awful because no-one has the slightest clue about good coding practices.
Pragmatism? (Score:3)
"No one seems to ask: why are we funding X different packages that do 80% of the same things, but none of them well?"
When I think about this, I'd rather have that than one single package, if only for the reason that without competition, I'd not be able to know if it was doing anything well or not.
Pragmatism here says plurality is probably better than some kind of Stalinist central control.
Re: (Score:2)
(oh dear replying to my own post...) .. of course, if you were talking open source, then that would be another matter :-)
Re:Pragmatism? (Score:5, Insightful)
The original article is clueless about the difference between research products and production software. In research there is no a priori omniscience about what is best. What you see at the end is the few survivors of an evolutionary competition of zillions of efforts. You don't see the three planned outcomes that we had known could have been written from a well thought out requirements document.
There is a decades old saying that scientists develop the next generation of algorithms using last years computers . COmputer scientists write last years algorithm on next years computer. It is still true.
Re: (Score:1)
I work in this field as well. Displaying 2d cross sections and rotatable 3d views with surface meshes has been a solved problem for a long time. There's no original research there. Yet there are in the neighborhood of a dozen major- and many more minor - viewers to do just this and only this, written with lab time and research money.
Not really. (Score:3)
I track a lot of scientific software on Freshmeat. You'd be amazed at the redundancy. Medical stuff isn't as bad as some areas.
Re: (Score:1)
I agree with this. Look at the amount of frameworks to do Agent Based Modelling and Simulation and you will see there are at least 5 that are the same thing (Repast, Mason, NetLogo, StarLogo, Swarm) and a lot more that are very similar.
Not going to happen (Score:5, Insightful)
Not only that most researchers are not proficient in programming language, they shape their codes more like prototypes so that they can modify the codes easily as the science progress. Conventional programmers will be frustrated with this approach since they want every single spec set in stone, which will never happen in research setting since research progresses very rapidly and specs can change dramatically in most cases. If you can set the spec in stone, it is usually a sign that the field has matured and is getting transitioned to engineering-type problems. Once the transition happens, it's no longer research, it's engineering. Then you can "make the code better".
Re: (Score:2)
Didn't you just describe why agile came about? Because we, as software professionals, realize that specifications are not set in stone and the system should be easy to adapt and modify for future requirements.
Re: (Score:2)
Does "agile" software development allow scrapping 100% of the code and radically change the spec (and thereby everything else) every about 6 months just because of new scientific publication? It may sound extreme, but this often happen in research. If we take time to "structure" our code, before we know it, we have to redo it all over again. We do use libraries like GSL, BLAS, ATLAS, etc. to make our lives easier. These won't change, but whatever we build on top of these often get scrapped at regular basis.
Re: (Score:1)
Does "agile" software development allow scrapping 100% of the code and radically change the spec (and thereby everything else) every about 6 months just because of new scientific publication?
Yes! Every iteration (month) you can throw the whole lot away and start again. You won't though, because there are always certain building blocks you can re-use.
If we take time to "structure" our code, before we know it, we have to redo it all over again. ... So, we really don't have incentives to "beautify" the code.
I see these arguments in all kinds of contexts. All it is excuses for poor work. The funny thing is that doing things 'right' the first time generally ends up taking less time overall. I wonder how much time you waste fixing broken code, or mistaken logic? Getting two hacks to work together? Perhaps if you stopped cutting corners constantly, you wou
Re: (Score:2)
Yes! Every iteration (month) you can throw the whole lot away and start again. You won't though, because there are always certain building blocks you can re-use.
No! Agile is about iteration and isn't really suited to stringing a bunch of prototypes together. I can drive a nail with my fist, but that isn't a good idea.
Re: (Score:2)
Re: (Score:1)
I call BS --- I work in a medical imaging lab. In the past two years we've had three CS "professional programmers" added to the staff. They all came in with "methodologies that will save the day" spouting the crap the parent poster is spouting .... they are still around but let's say they are much more pragmatic. And hmmm after about 6 months they all agreed that it is indeed a unique situation.
Re: (Score:2)
quite often you are writing single use specialized programs and you accept some limitations - I recall one wave tank experiment where if you enter the wrong parameters you could cause the computer controlling the experiment to create such a powerfull wave it would have broken the tank and flooded the lab.
I remember one guy whos program to c
Re: (Score:2)
Re: (Score:2)
Yes it does. That is exactly the point about it. A good team will always be able to craft its libraries/frameworks for this project in a way that even for full rewriter they still have useful code left to be reused.
angel'o'sphere
Re: (Score:2)
by the time I explain a cutting edge algorithm idea to a (BS level) programmer and teach them enough math to implement it correctly
There are two things wrong with that statement. Firstly, you're trying to get rapid results out of a BS level programmer (i.e., a total greenhorn with no real experience) and secondly, they probably don't understand that much math to begin with either (i.e., did you specify that when hiring them?) If you'd done your hiring more sanely, you'd have someone who could actually support you properly. Yes, they'd cost more than just a tyro but that'd be money well spent. Remember, you're not getting someone who's
Re: (Score:2)
Didn't you just describe why agile came about? Because we, as software professionals, realize that specifications are not set in stone and the system should be easy to adapt and modify for future requirements.
There is a big difference between "set in stone" and "unconstrained". Put another way, "XP is aimed at customers who don't know what they want. [softwarereality.com]"
Linux (Score:2)
The article description sounds like a perfect description of the state of all the linux distro's, all the linux desktop managers, and all the linux word processors. That is, there is a proliferation of not quite compatible products that do 80% of the job well.
So I guess the article is saying we should take this shining example from computer engineering and use it to refor how scientific packages are developed.
Wow. glass houses much?
Re: (Score:2)
I completely agree with the not being proficient in programming languages. And they should be required to take some security classes if they're going to be writing any significant code that runs as a CGI or acts as a service. For some reason they don't like it when I refuse to run their shell script CGIs. ... but I'd argue that they don't 'shape [it] ... so that they can modify the codes easily' ... Unless 'easily' means an attempt at a find & replace in 40+ places when they should've used a function
Re: (Score:3, Informative)
I do medical imaging as my day job. The parent understates the "spec" problem -- its just as much a testing problem. The typical spec I work against is "create a tool that distinguishes this disease state from some other disease state and from healthy normals with optimal power". Optimal power is, of course, only defined by the results you get or against other software (probably that measures different facets of disease). Moreover, the spec gets driven by log10 increases in image numbers --- that i
Re: (Score:2)
specs can change dramatically in most cases
Moreover, the very act of scientific progress is questioning and experimenting with the assumptions in the spec.
Re: (Score:2)
We've developed a lot of in-house code here at SLAC. Often we have had better success with the prototype code developed by scientists than with the rigorously written by software engineers. This is because at a research lab the requirements are constantly changing (if we knew what we were doing it wouldn't be research), and the design cycle for specify / write / test / debug / deploy is to slow. Having the people directly involved with the experiment writing the code in real time gets better results faster
Re: (Score:3)
I think the research environment is fundamentally different from a commercial environment. In many software projects the requirements are continually changing. This is not a result of poor planning by the people requesting the software, but rather the desire to take best advantage of new scientific information as it becomes available. The resulting informal code development is very efficient for the project, but produces code that is difficult to transport to other projects.
Your situation is not different to many commercial environments. In fact, this is one of the largest problems in software development (notice I use the word development, not engineering). There are ways to write quality, flexible, extendible, maintainable programs in these environments, but it is much harder. I'm not talking out of my arse here, I've been in this game for many years now, and have seen approaches that work, and ones that fail. If the resultant program is truly "use once, then throw away", th
Re: (Score:2)
Not only that most researchers are not proficient in programming language, they shape their codes more like prototypes so that they can modify the codes easily as the science progress.
News flash! Everyone who is writing software that is "new" is "prototyping". This is not a new problem, this is why we have design patterns and TDD.
Conventional programmers will be frustrated with this approach since they want every single spec set in stone
I take it you know very few programmers, maybe at IBM? Managers want "specs", many of us developers want to work on something interesting. If there were all these lovely specs set out in stone, it wouldn't be much of a challenge now would it?
...will never happen in research setting since research progresses very rapidly and specs can change dramatically in most cases
Hey! That sounds just li
Re: (Score:2)
Well, I only prototype if I need/want to. Even in an "agil" environment in every iteration we deliver a "customer ready" end product. Not a prototype.
Software Development is Engineering, hence the word Development. If you can not apply software engineer
What the hell are you talking about? (Score:2)
Git and Mercurial (hg) are not sites, they're programs. They have nothing to do with code review. You could say that they do promote "efficient code exchange", but so does any other VCS. Are you seriously trying to tell us that these big labs are not using version control while developing their systems?
Re: (Score:3)
Are you seriously trying to tell us that these big labs are not using version control while developing their systems?
That's a lot more common than any sane programmer would suspect.
Re: (Score:1)
advent of git, hg, and related sites
ie github, bitbucket, gitorious, etc. IMHO there is something fundamentally different about kind of sharing on newer dVCS as compared to traditional (cvs, svn) VCS. Not to mention the benefits of in-line code-review (on github and others).
Same happens in the private world (Score:2)
It's probably a good thing that there's at least two different groups working on the same thing. Competition creates incentives for those within it to write better code so that it's more widely adopted and they get more funding. Why do we have Chrome and Firefox?
This happens in private companies too. I heard a story about a private company that hired two different offshore contractors to write the same software independently of one another -- they were on a tight deadline and had actually read the Mythic
Intel and Microsoft (Score:3)
There is a legend that this is what happens at Intel and Microsoft. It used to be said that every odd numbered Intel was not much of an improvement. It's still true since Windows 1.0 that every other release of windows has sucked. It was perfectly predictable that Vista would tank. (No I don't hate microsoft. Even people that love microsoft can see this has become a "law".)
In both cases the supposed explanation is that there are two difffenent teams working at the same time. The better one gets the fi
Terms of grant must specify coding standards (Score:2)
Re: (Score:1)
I wouldn't say that scientists are not good software engineers. The main problem is that they aren't paid to be software engineers. To get funding, scientists have to publish in peer reviewed conferences/journals. In order to do that it is enough to get the programs into a rough shape. Spending an extra year on polishing the software is just not going to happen as this is not easily publishable in a journal.
This sucks, but until this is changed, don't expect great software engineering out of software releas
Re: (Score:2)
This already part of NSF:
http://www.nsf.gov/news/news_summ.jsp?cntn_id=116928 [nsf.gov]
(Although it's called "Data Management," it also applies to software generated in the course of research.)
Not so fast. (Score:1)
As a Ph.D. candidate who writes scientific software at a large research university in the US under NIH grant funds, I can say that simply adding more developers to a scientific software project is an unrealistic solution to the problem. Having been to a conference of developers for a well-known chemistry software package (which will remain nameless) I have seen firsthand how seemingly good intentions can quickly turn into an epic battle of conquest and control of the software. Add in the huge egos and arrog
Re: (Score:2)
As a Ph.D. candidate who writes scientific software at a large research university in the US under NIH grant funds, I can say that simply adding more developers to a scientific software project is an unrealistic solution to the problem. Having been to a conference of developers for a well-known chemistry software package (which will remain nameless) I have seen firsthand how seemingly good intentions can quickly turn into an epic battle of conquest and control of the software. Add in the huge egos and arrogance of these scientists (some very famous in their field), and you wind up with software that no one wants to develop due to problems that have nothing to do with funding or lack of qualified developers. This is probably one of the main reasons new scientific software is created in the first place.
There are examples of problems in every solution. Zotero [zotero.org] is a great example of F/OSS working beautifully, exemplifying researcher collaboration to develop a research collaboration tool. With some standardization and better communication between the F/OSS community and the research industry, I think we can open the door to more of Zotero and less of your chemistry software example.
Govt. doesn't "get" open source (Score:2)
A surprising number of researchers work around this restriction and keep the software proprietary (or at least secret) by contracting the software out and purchasing outside services.
Even when the software is public domain, there is no uniform requirement to make is openly available. Often you have to write to the principle investigator and after delay an
Re: (Score:2)
Actually, no it's not.
There's a few issues ... the first of which is called 'Dual Use' ... basically, there's software that can be used for miltary purposes, which will *never* be put into the public domain.
Then there's stuff that might be able to be licensed, and there's a whole gaggle of lawyers who we have to get our stuff cleared through, so we can get stuff ceritifed that our work has no value, and the government isn't going to make any money off of it.
And then there's the security concerns ... your st
Re: (Score:2)
Software paid for by the government is supposed to be free in the public domain.
Not really. "Paid for" is a very inspecific term. Software that is developed as the result of a government-funded grant at a non-government institution is not "supposed" to be public domain. (Individual grant agencies may make such stipulations, though.)
On the other hand, software that is the work product of government employees is supposed to be public domain. Some agencies aren't particularly helpful about distributing the source (although if only one person who cares is able to obtain it, since it's publ
Re: (Score:1)
Software paid for by the government is supposed to be free in the public domain.
And this has been codified in which act of Congress?
Re: (Score:2)
https://journal.thedacs.com/stn_view.php?stn_id=56&article_id=180 [thedacs.com]
Re: (Score:1)
No where in there at all says that any software paid for by the government is in the public domain. In fact, the phrase "public domain" doesn't even exist in that article. Your link only specifies the circumstances when something can released as OSS which is not the same as what you claimed.
This article summarizes when the U.S. federal government or its contractors may publicly release, as open source software (OSS), software developed with government funds.
Clearly you didn't even read the link you posted. Now, can you provide the actual law that says that software paid for by the government has to be in the public domain?
Re: (Score:2)
Re: (Score:1)
Also since I actually do work at a company who contracts for the government, almost none of the software we create the government can release as open source or public domain as the company reserves the copyright. This is also very typical for a lot of government contracting. So thus, you're entire premise is absolutely false. Not to mention your false conflation of "public domain" with "open source".
Re: (Score:2)
The handy reference chart that I linked to states that the default and usual contract clause (FAR 52.227-14 or DFARS 252.227-7014) is for the government to reserve copyright for works created at public expense. This has been my experience. In my original post I also stated that many government contractors get around this by "contributing" some of their own funding or IP to the contract and thus establish an exception to t
Re: (Score:1)
What's also funny is that your link even cites statutory law that completely contradicts you:
It is true that 10 U.S.C. 2320(a)(2)(F) states that “a contractor or subcontractor (or a prospective contractor or subcontractor) may not be required, as a condition of being responsive to a solicitation or as a condition for the award of a contract, to sell or otherwise relinquish to the United States any rights in technical data [except in certain cases, and may not be required to ] refrain from offering to use, or from using, an item or process to which the contractor is entitled to restrict rights in data”
Re: (Score:2)
Science wants novelty, not quality (Score:1)
The trouble is that you don't get grants for software development. You get them for original research, i.e. novelty. All you need to publish a paper is a hackish implementation that works once. After that's done, there's no reward for improving your code and iterating further. If you're trying to stay competitive, you move on to the next thing.
Developing good software is for industry to do. Unlike academia, industry can get massive rewards for making a well-implemented toolkit. No academically-developed sof
Re: (Score:2)
First, as an academic group, it is important to make your software usable for other groups. It brings collaborators to you - researchers who want to do something new, that your software almost supports. It's faster for them to work with you than start from scratch - more for your expertise in the field than for your coding abilities.
Second, industrial software isn't open source, and for niche markets is often of terrible quality at a very high price. Open-source is s
Because researchers aren't programmers (Score:4, Insightful)
I don't think this is a bad thing, myself. Most of this code is single-use only, being written for a specific purpose (or a specific thesis paper), and will never be used again. Not to mention they're taking enough time to get their degrees as it is- I don't think it's reasonable to ask them to become expert software engineers as well. OP claims that taxpayer dollars are being wasted, but think how much waste there'd be if every researcher had to get a CS degree before they started in their own field, too.
Re: (Score:2)
You think that's bad, you should see the software written by engineers that's used to perform many important engineering tasks!
Convert research into useful (Score:4, Insightful)
For example, you may find Kalman filters, genetic algorithms, neural networks, GPU implementations, etc. all able to solve a particular problem. For real-world software you really don't care about all that, you just want the ONE that works best in your application. Of course then there will be papers on "extensible frameworks" with "plugins" that can handle any of those implementations... Again, for real software you pick the one that works "best" for your definition of best and go with that. To make this happen, you need to get an ego-less (read non-PhD) software team to pull it all together.
The problem is not ... (Score:2)
... wasted effort, but the allocation of the money and the people involved.
It is not all that hard to create very good, powerful and even big applications, but it becomes a hell of a lot harder if you throw tons money and people at it.
And yes I have worked in university physiology and they have horrendous software as well, but there is simply no other alternatives and very few real programmers working in the field, so no one who could fill the hole knows about it.
Solution : Make it findable (Score:2)
Although I admit, some of it's the 'not invented here' problem, one of the big reasons there isn't better collaboration is that most scientists don't know that someone's working on something similar.
I deal with software that works on FITS files. The two main fields that use FITS -- medical imaging and astronomy. Do you think the two collaborate? Hell no.
And even if you do find out about some great new tool ... it's after it's been released. Which might be two or more years after they started development,
Standardize on efficient data representations (Score:2)
Re: (Score:2)
Re: (Score:2)
Center grant and modularity (Score:2)
1. A center grant. These are understandably difficult to get, and typically require some venerable central Dumbledore (preferably with nobel), but they get around the insightful: "Science wants novelty, not quality" comment. That is, a center grant is designed to allow money and publications to flow for development without overt novelty.
2. Keep the software modular, well-documented, and open. Publish everything about every interface, file
Re: (Score:2)
Simple problem (Score:2)
It's a simple problem, research software are either written by scientists that don't program or programmers who don't understand the science. So either you end up with a powerful and technically correct software that has an interface that is completely cryptic, confusing and generally unusable or you get a nice glossy looking software that doesn't do what it's supposed to do.
It's really hard to find people who can do both.
Re: (Score:2)
All of which is just expanding on my point. Scientist don't program and don't care about software engineering and best practices, as you say, they don't have time for it. Also, a lot of what they create isn't intended to live for a long time, it's often put together as fast as possible for the use in a single (or a small set of) experiment. The problem comes when those one-off programs mutate into larger programs that somebody then decides they can make money off by selling to other scientists.
Re: (Score:2)
Yeah, right. Scientific software is not at all like a word processor. In a word processor, you can tell immediately if it is behaving wrong: you know what it is supposed to do. But in some scientific computation, it's just the reverse. It tells you that the answer is 3.184, and it is not immediately obvious whether the answer is right or wrong. That's the difference.
"To weed out the crap" as you put it, you need to understand the computation in detail, design some test cases that are relevant to
Sparklix take on scientific software (Score:1)
Apparently a lot of today's scientific software is developed by engineers who know nothing about the scientific areas they are targeting, trying to create yet another CRUD or CRM-like application with scientific flavor. What we attempted to do with Sparklix is to bring the researchers an experience which would be as close
In-house approval exemption (Score:2)
Medical imaging is a special case.
The FDA has a "in-house" exemption for things like software-based medical solutions (for example, the software that calculates the best way to deliver a radiation blast to your tumor, or the software that identifies tumors in MRI results, or whatever). In essence, to share software you've developed, you have to go through a lengthy and expensive approval process. Once you've written something, no matter how nice it is, there's a huge threshold of liability, expense, and has
What can slashdot do? (Score:1)
It's kind of useless to post this on Slashdot honestly. What good is it going to do? If you have an idea for how to solve this and are a researcher then talk to the NIH. Otherwise this is just a lot of hot air.
If the software is mature enough to be widely useful, then a company should try to commercialize it under and SBIR or an STTR or something. If it's not mature then it should stay what is is - a research prototype. Most research prototypes end up being useless in my experience and it's not worth t
Get a grant to create guidelines and clearinghouse (Score:2)
Drop whatever you are doing right now and start writing a grant proposal. Here is what you will propose to do:
Re: (Score:2)
We have X different packages because they do what they were written to do very well.
But a significant fraction of that X are programs written to work on one version of one specific dataset. There can even be sane reasons for this; some datasets change format between versions. Genetics data is uniformly terrible this way. I worked on a project last year to take one of these hyper-specific packages and turn it into something that another person (i.e., anyone other than the PhD candidate who wrote it) would consider using at all; it was a huge amount of work from a talented team of about 15 s
Much ado (Score:2)
In fact there are several well-designed user-extensible medical image processing frameworks available already. ImageJ, MIPAV, and ITK were funded by the NIH and fill the very void suggested by the OP. Many more mature medical imaging tools that serve a variety of niches are freely available, many of which include free source code.
Frankly, I think the OP's main thesis is fundamentally wrong. Medical imaging research is about inventing or improving IP techniques and algorithms, not implementing and distrib
submitter is a commercial programmer (Score:2)