20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report (winbeta.org) 349
An anonymous reader writes from a report via WinBeta: A new report from scientists Mark Ziemann, Yotam Eren, and Assam El-Osta says that 20% of scientific papers on genes contain gene name conversion errors caused by Excel. In the scientific article, titled "Gene name errors are widespread in the scientific literature," article's abstract section, the scientists explain: "The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions."
It's easy to see why Excel might have problems with certain gene names when you see the "gene symbols" that the scientists use as examples: "For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to '2-Sep' and '1-Mar', respectively. Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession '2310009E13' to '2.31E+13'). Since that report, we have uncovered further instances where gene symbols were converted to dates in supplementary data of recently published papers (e.g. 'SEPT2' converted to '2006/09/02'). This suggests that gene name errors continue to be a problem in supplementary files accompanying articles. Inadvertent gene symbol conversion is problematic because these supplementary files are an important resource in the genomics community that are frequently reused. Our aim here is to raise awareness of the problem." You can view the scientific paper in its entirety here.
It's easy to see why Excel might have problems with certain gene names when you see the "gene symbols" that the scientists use as examples: "For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to '2-Sep' and '1-Mar', respectively. Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession '2310009E13' to '2.31E+13'). Since that report, we have uncovered further instances where gene symbols were converted to dates in supplementary data of recently published papers (e.g. 'SEPT2' converted to '2006/09/02'). This suggests that gene name errors continue to be a problem in supplementary files accompanying articles. Inadvertent gene symbol conversion is problematic because these supplementary files are an important resource in the genomics community that are frequently reused. Our aim here is to raise awareness of the problem." You can view the scientific paper in its entirety here.
Wait, what? (Score:5, Insightful)
Re:Wait, what? (Score:5, Insightful)
I don't know how much number crunching was actually involved here. I suspect the problem comes from using a spreadsheet as a database.
Because databases are, you know, hard.
And "errands" creep in all over the place. ;)
Re: (Score:2)
That's my take on it. Spreadsheets are wonderful things... to a point, and then, when their developers and maintainers cross that line and start trying to use spreadsheets as querying engines, it can all get very ugly. Even if you get it to work, I have yet to see the spreadsheet software, and I was using them as far back as Multiplan, that didn't turn into a maintenance nightmare where one false step could lead to errors, or much worse, gibberish.
Re: (Score:2)
Tell that to the CFO at my last employer.
Disclaimer - I recommended changes to nearly all of this:
This was in the Windows95 days. AUD$97 million budget, data fed to a master excel file from linked files in each department. Of course it was IT's fault when excel crashed and corrupted his master file, despite having told him that excel was not up to the job. He ordered a new laptop with more memory.
There was definitely a sense of schadenfraude some years later when he was "named" in an auditor's report. For t
Re: (Score:2)
I actually worked doing government auditing in a department I shall decline to name. And as much as I railed and wrote STERN LETTERS and what not, nothing would shake the bureacrats love of their bloody spreadsheets.
The fact that the *only* asset list in the whole damn department of 10K+ employees around the country was kept on a single excel file that was collated from other excel files by a vbscript written by a secretary was not seen as a problem by anyone except me and the IT dept (who where also system
Re:Wait, what? (Score:5, Funny)
Citation needed.
Re: (Score:3)
I use them a lot for doing budgets and cash flow forecasts. That's the kind of job they were originally designed for, and within that milieu they can't really be beat. But going much beyond, where you have to use more and more query-like functionality is where they start to break.
Re: (Score:2)
Re: (Score:3)
Spreadsheets are really useful tools but have limitations. One thing I did enjoy was using that spreadsheet layout for coding, it seems much more functional. One sheet with programming cells, another bunch of cells on the same sheet for output, various continuous variable checks, distributed where ever those checkpoints are required, multiple sheets with similar outputs in similar locations and notes all over the place. You could really turn a spreadsheet into a great custom coding interface, than once it
Re: (Score:3, Insightful)
It's a shame that when defining open document formats they didn't bother to define an open macro format too. We have Excel using VBA macros, and OpenOffice using Python and BASIC. Google Docs uses Javascript. Wouldn't macro portability be great?
It was user error, not a spreadsheet problem ... (Score:3)
I don't know how much number crunching was actually involved here. I suspect the problem comes from using a spreadsheet as a database. Because databases are, you know, hard.
It was user error, not a spreadsheet problem. Prefacing the names with a ' would have identified them as a string not to be interpreted as numeric or date. If the researchers couldn't manage this do you really think they could have used a database?
That said, yes spreadsheet are overused and abused.
Re: (Score:2)
I think the real source of the problem is many scientists' insistence on naming everything using (what they think is) a clever acronym. it's become a virtual plague in most technical disciplines over the past two or three decades. I think it's directly related to the desire by scientists to somehow become famous beyond their little niches.
It's gotten so bad that, much of the time, the spelled-out names don't really even make sense.
Re: (Score:2)
I think you somehow managed to bolster the GP's point through some weirdly-inverted way. Acronyms are a form of compression to make communication easier. Cutesy acronyms whose long forms are twisted to fit the acronym do not ease communication.
Re: (Score:3, Interesting)
No, this would be a disaster. The acronyms have the advantage that they are, relatively, semantics-free., If we turned them into long hand, then they would describe the gene in some way. Which means that the descriptions in the knowledge would go out of date, or would have to be changed. It's a recipe for instability.
It is very easy to laugh at biologists and think that you know how to manage data better than they do. In some cases, you may be right, but in this case it is not so. Identifiers are there to i
Re:It was user error, not a spreadsheet problem .. (Score:5, Insightful)
It could rather be a conversion error. For example, if you have the original in a CSV file (possibly output from one program) where strings have no lead colon, and then load into Excel or LibreOffice, it will (by default) turn everything it can into a numeric format. One needs to be aware of that and ask that the column be converted to text -- which is easy to overlook if you have a column that's mostly non-ambiguous, but somewhere far below is a single date-like name.
I've gotten hit by this many times with CSVs coming out of our school's learning management system, with long numeric student IDs that get turned into scientific notation in the spreadsheet application. In some sense that's easier to catch, because it will hit a whole column of data at once; but even so it's distressing how often I need to backtrack to resolve that.
Re: (Score:2)
Re: (Score:2)
One needs to be aware of that and ask that the column be converted to text -- which is easy to overlook if you have a column that's mostly non-ambiguous, but somewhere far below is a single date-like name.
My process for importing into a spreadsheet includes always making sure to check the data type for each column. If you do that, then it is very difficult to overlook.
The basic problem here is not that the tools function as designed, which is actually very useful. I use spreadsheets fairly regularly for mundane list-mangling tasks. The problem is people not knowing how to use the tools. You see this everywhere you go in academia. Literally every position on a college campus typically now requires familiarity
Re: (Score:3)
No, it is a conversion problem. .csv "wizard" , it won't convert.
True, if you make the column text in the import
But if Excel does convert text to a date or to a number, you lose the original text and have to import again to get it back.
Numbers displayed in scientific notation (or any other format) and dates displayed however Excel is set up to display them, are stored internally as just numbers, and the original text is not saved anywhere.
Re: (Score:3)
Re: (Score:2)
Re: (Score:3)
Which is too late if the damage has already been done. And if the first 2000 lines look ok, no one is going to do that because, hey, the auto import worked!
Re: (Score:3)
The problem is that the not all tools require a ' to stop this behaviour. And, in fact, adding this may well break these tools.
Ultimately, this is not a new problem. It was first noted about a decade ago in yeast (which uses a lot of very date like gene names). It's a bit depressing it's still happening.
I'm unconvinced that this can be classed as a user error, though. Excel is using a heuristic to determine the data type of a field (probably on a per cell, not per column basis). And that heuristic is failin
Re:It was user error, not a spreadsheet problem .. (Score:5, Insightful)
Why can't the default behavior be that the data is just not modified, at all, unless I tell Excel I want it to be.
Re: (Score:2)
The single quote won't help when pasting in multiple cells of unformatted data, however. One should always change the column's data type from something other than "General", such as to "Text", before pasting in unformatted data.
Re: (Score:2)
Why can't the default behavior be that the data is just not modified, at all, unless I tell Excel I want it to be.
The default behavior is to treat the field as whatever you've told the spreadsheet that it is. By default, every cell is set up for numeric data types. The same is true on a CSV import. In either case there is a simple way to prevent it happening; select a text format for cells in the sheet, or select a text type for the column while importing. The problem is misuse of tools, not a problem with the tool. You wouldn't complain that a band saw is capable of removing fingers, would you?
self-response addendum (Score:3)
Penn Jillette on Donald Trump, Hillary Clinton, And Why He's All in on Gary Johnson [youtube.com] — 2 August 2016
I watched this video yesterday. There a fabulous exchange 24:30–30:00 on truth and naievity.
Re: (Score:2)
Here is why for those without the background.
Typed variables in a database completely eliminate this issue.
If the users had used a database, even one of the MS ones (one of which newbies can deal with within a week), that would have solved it which was the poster above's point that you were unable to grasp
And how would that helped when publishers requested their supplementary data in Excel format?
They are scientists and more likely than not the did use some tools for scientific number crunching ("R") or similar that kept and processed data in typed variables or even kept the raw data in an actual database. But you can't ship your Oracle server to reviewers by email and in lack of a document-like database format, peope are turning to Excel.
Re: (Score:2)
Did they do that? Obviously you do not know and are making things up. Why bother to lie over something so trivial?
Re: (Score:2)
Re: (Score:2)
I suspect the problem comes from using a spreadsheet as a database.
Or leaving the job of entering data to an assistant who doesn't quite understand the terminologies and abbreviations. Databases are not hard, as we all know, but creating a proper entry form to one is perhaps more work than seems worth it for an ad hoc job. A spreadsheet would do a better job, if it was possible to create custom formats for cells (which would then understand the data better); come to think of it, it probably is, I just haven't done it, so I don't know.
Re: (Score:2)
I don't know how much number crunching was actually involved here. I suspect the problem comes from using a spreadsheet as a database.
Because databases are, you know, hard.
Yeah. Great.
But WHY are they so hard? Because usually, a "database" is something that is installed and stored on a network server and you can't mail a network server to your colleague, costumer or publisher. So even if you KNOW that a database would be the right tool for the job at hand, you end up with your data in a spreadsheet. Either to send it to someone or by receiving it from someone else.
The only alternative that would offer searching, filtering, sorting (in general: querying) features that you need
Lazy (Score:2)
I think the thing that really surprises me is that all my professors told me it was hard to get published, when failing to make sure your data was correctly entered into whatever spreadsheet program you used to for number crunching (and creating graphs) was one of the dead basics of working there. Yet 1 in 5 papers has notable failures here? And nobody noticed before publishing them? What kinds of major errands have gotten in, then, if basic spot checks are getting failed?
It is hard-ish, at least in a number of fields, unless you are either a big name or doing something of popular interest at the moment.
Ultimately, though, this kind of error comes down to laziness more than vetting for quality--because a very minor editing mistake should not affect paper publication, but should be caught before publication. All it takes to spot the error is a careful reading of the paper by one of the professor, the grad student, a reviewer, an assistant editor, or an editor. And while hopef
Re: (Score:2)
I agree with your surprise, but reviewers often look how an article fits with their own pet theories, or political allegiances. They only try to destroy it contradicts them. And many magazines allow you to propose a list of reviewers. The result is that quite a few articles just slip through.
It's often easier to kill a paper because its methods are sloppy or because not all steps have been thoroughly tested than to actually verify the data and the way it's been processed. The latter can take weeks or months
Re:Wait, what? (Score:4, Funny)
What kinds of major errands have gotten in...
Grocery shopping, filling the gas tank and picking up the dry cleaning.
Re: (Score:2)
. Sheer numbers. There are literally millions of them. Manual entry into a database was checked and okay. Export from the database's proprietary, obscure storage format to common as dirt Excel was already too big to spot the fault.
Re: (Score:2)
Maybe the original data was fine, exported from some other application and then imported into Excel for publication, at which point it was corrupted. CSV files don't have formatting or data type information, for example.
TFA seems to be saying that while the results are valid, Excel is a crappy format to distribute the data in.
Submitted without comment (Score:2)
Re: (Score:2)
LaTeX (Score:5, Insightful)
Why in God's name are you using a Microsoft product for scientific documents?
Re: (Score:2, Insightful)
LaTex and/or TeX take time to learn. Time that can be used to play with genes.
Excel is a GUI, crunches numbers, makes pretty graphs and keeps all your data in nice grids and is easy to learn (until you hit the details like these format errors). (La)Tex is great if you want to write scientific papers that are nicely laid out and can typeset mathematical formulas (Knuth's itch). However, you'll most likely still use Excel to do the crunching/graphing/saving.
How many excel sheets have you seen where it's on
Re:LaTeX (Score:4, Funny)
still use Excel to do the crunching
Good god, I hope not. When I saw the title for this article I thought for sure it was referring to errors caused by the aggregation of questionable digits resulting from machine precision floating point operations, not something as simple as type conversions. Excel has been the bane of my existence for years because testers keep trying to use it to verify results from a data processing framework I wrote where the operations for some use cases involve 20+ digit decimals. No matter how many times I explain to them the concepts of machine vs. arbitrary precision, decimal precision vs. accuracy, rational vs. decimal representations of numbers, etc. the spurious 'rounding error / does not match the XLS' bug reports just keep coming. Drives me nuts. The idea that scientists may be making the same mistakes with important research is kind of scary.
Then again, I am usually shocked by the amount of error considered tolerable in the scientific / EE applications of the framework. The real anal retentives are the financial use cases, which tend to include 'penny allocation' algorithms for distributing fractions of pennies left as remainders from dollar amounts in the 10s of millions, and they absolutely will file a critical severity issue over a .00000000001 discrepancy.
Re: (Score:2)
In finance, you're not supposed to use any floating point math.
Re: LaTeX (Score:3)
Re: (Score:2)
In accounting, when you add up the numbers, the result must be exactly zero, not 0.0000000001.
It is not for saving pennies, it is for verification purposes. If there is a penny missing, it means that there is a mistake somewhere. It may be a small rounding error, but it may also be several million dollar mistakes (or fraud...) adding up to one penny.
Re: (Score:2)
> LaTex and/or TeX take time to learn.
I'm afraid that they're also quite useless for cut & paste transfer of data among documents. Their GUI tools are also _extremely_ limited.
Re: (Score:2)
Re: (Score:3)
As I said upthread, it's probably not a data-entry problem. More like a file conversion problem (importing text/CSV to the spreadsheet application).
Re:LaTeX (Score:5, Funny)
LaTeX is not free of problems either. They are just different. If you care, you take the time to fix them, if you don't you don't fix them. Simple as that.
https://pbs.twimg.com/media/Ci... [twimg.com]
Re: (Score:2)
Re: (Score:3)
Non sequitor. Calculations need to be made somewhere in the workflow process, and LaTeX does not help with that.
Re: (Score:2)
Because a proper tool costs money and learning to use it takes time.
Research costs money and the people that can do the research with less costs get to do more research. Microsoft Office is "free" to most because it effectively comes with the computer. Getting the right software can costs thousands per seat.
Re: (Score:2)
Why in God's name are you using a Microsoft product for scientific documents?
Because there's nothing wrong with doing so providing you know it's limits and it's faults.
And because these problems of user error transcend all software including those specifically designed for scientific purposes.
And because writing a document in LaTeX has little to do with crunching numbers and doing lookup tables in a spreadsheet. But hey you got your anti-Microsoft rant of the day out so success right?
Re: (Score:2)
Re: (Score:2)
Probably not a data-entry problem. More like a file conversion problem (importing text/CSV to the spreadsheet application).
Re: (Score:2)
It could be either.
If the cell is General format rather than Text, excel can interpert what is typed in as a date or numeric format it it matches certain patterns.
When using the import wizard, to import text files, you can chose how Excel interprets each column (General, Text, Date or Skip) on the third step of the wizard. If you click Finish on the second step excel will default to general and perform conversions.
Re: (Score:2)
> Probably not a data-entry problem. More like a file conversion problem (importing text/CSV to the spreadsheet application).
Especially via cut & paste.
Re: (Score:2)
So while "simpress" does the powerpoint slide construction task you are probably better off doing a "save as PDF" from it instead of using it as a presentation tool. If you want video effects make a video file with something purpose made - powerpoint and simpress suck at both video style presentations and static documents paged thr
Re: (Score:2)
That's great for people that have access to Matlab from the college they attend or from their employer.
If you can prove you are a student the $500 price tag is a bit much. I don't recall how much I spent on the student version of Matlab I bought many years ago but being a student version there were imposed limits on the size of arrays it would hold. As such the one time I needed it for a project it proved worthless since it could not hold my data set. Guess what I used instead? Excel.
The "pro" version o
Not strictly Excel's fault (Score:5, Informative)
Those conversions look like cases where the column type during import was left at "General" instead of being set to "Text" as it should have been, telling Excel to try and infer the actual type from the format of the column's contents. It's an awkward situation where the user should be telling Excel what the data type for each column is, but it's not strictly Excel's fault for doing what the user told it to do. IMO Excel should be either changed to not have a default type and to not allow an import until the user's selected a type for each column, or it should throw up an error if it infers different data types for a column for different rows.
Re: (Score:2)
Re: (Score:3)
That's easy to say but people need to understand what they are looking for. As a Specialist in the Army I happened to be in the ops office while one of the sergeants was working on a spreadsheet that handled some sort of inventory. He knew I knew something about computers and so he asked me to come over to look at the funny formatting that Excel was doing on him. What I saw was a number that could have been a date, price, part number, or something else. I started asking the sergeant what he was trying t
Re: (Score:3)
Select top left cell, format as table, make sure "my data has headers" is checked.
Click just below the header row to select and entire column of data (excluding the header row). Format as you please, validate as you please. You can even right click, insert a new column, then create a formula that verifies the data and outputs nothing when it's good and "OHSHIT" when it fails verification. Then you can non-destructively filter your table on "OHSHIT".
If you're using Excel at least learn to use it. This sh
Re: (Score:2, Interesting)
Just because the stupid default behavior can be changed doesn't excuse it from being stupid in the first place.
(See also: ads built into Windows 10)
Re: (Score:2)
While I'm no Microsoft fan, and the Excel default behavior could certainly be better, I can't see my way to making Microsoft take the hit for this. (LibreOffice can be similarly abused, for that matter.)
The issue is multifold:
1) Using a tool not intended for the purpose; spreadsheets may be easy and convenient but everything is not a nail for the spreadsheet hammer.
2) Lack of understanding about how the chosen tool works.
3) Failure to do simple proofing and verification.
This one is down to the researchers,
Re: (Score:2)
1b) Lack of a appropriate tool to wrap up stuff that should be in a database (or has been extracted from a database) for display and distribution.
There is a reason why people the spreadsheet-hammer - it is still better suited than the word processor screwdriver or the plain text chainsaw., or the pdf/png belt sander.
Re: (Score:2)
"I can change the column when/if I need to"
Actually, I have major problems trying to convert text to a number format once it's in the spreadsheet program. (I use LibreOffice, but I assume Excel functionality is the same.) If I take a column that's text-formatted, and click Format > Cell > Number, then all of the numeric stuff gets a single quote prefixed, sabotaging the attempt to treat it as a number. Your proposal would generate a massive user outcry for that and other reasons.
Re: (Score:2)
Re: (Score:2)
I think that feature is build into excel since the Office of 95. And it still is somewhere.
And each time I need it I am not able to find it for the life of me.
Further proof (Score:3)
That nobody reads all that shit. The only people reading scientific papers are the people writing scientific papers.
Excel is (almost) a great grid oriented editor (Score:2, Informative)
Many, many people use Excel as a grid oriented editor to capture a mix of text and numeric data. It's massive overkill, and the helpful features like automatic date conversion DO get in the way.
But it's ubiquitously available (either in MS form or various and sundry clones)...so it gets used.
No surprise - same erorrs in finance & ops (Score:3)
In the year 2016, a disturbing amount of human activity is run through Excel instead of proper databases.
A similar study [dartmouth.edu] from 2009 tested for errors in various operational spreadsheets and concluded, "Our results confirm the general belief among those who have studied spreadsheets that errors are commonplace." The Financial Times commented on [ft.com] the prevalence of spreadsheet errors in business, saying it's probably a function of training and organizational culture.
I've heard from a few salespeople in the software industry that their biggest competitor in the SMB space isn't $BigCRMCorp, but Excel spreadsheets that have acreted over the years.
In other news... (Score:2)
...scientists are too fucking stupid to use Excel properly. Format your cells correctly. It's not Excel's fault. It's no different than any other occupation that needs to use the tool correctly.
Re: (Score:2)
"...scientists are too fucking stupid to use Excel properly."
So, who, other than you of course, is Excel's target audience?
I can tell you, it's not me.
I'm only willing to fight with Excel/Open Office when I wand to do plots. And that's only because gnuplot is even more obtuse than the spreadsheets.
Re: (Score:2)
So, who, other than you of course, is Excel's target audience?
Excel's target audience is people who use Excel. Whether those people know how to use it is the issue. Word edits documents just fine too, but that doesn't stop you getting someone manually typing a list of numbers, putting new-lines in instead of paragraph separators, and then everyone wondering why the hell Word's formatting systems proceed to screw up the entire document.
We in the world put a large number of people in front of a large number of systems without providing any of the basic training required
Absolutely not limited to scientific publications (Score:2)
Spreadsheets should be used to present data in the form of a view. Use databases to store data and spreadsheets to create views and statistics of it.
Re: (Score:2)
then you only need a way to attach that database to that email with the excel file, too....
But yes, that would be the proper way. But we would still need a lightweight database-as-simple-document format. Heck, that even COULD BE Excel with a special type of data-worksheets in a document that enforce data structure and do not allow formatting. For all teh use cases where you don't need the performance or multi-user or transactions/data integrity/replication of an actual database server.
The task of mailing ou
Don't knock it (Score:2)
...mutations propelled life.
Not even in top 10 mistakes (Score:4, Funny)
It's just a tool... (Score:2)
And is limited to how well you use it. As the article noted, OpenOffice and LibreOffice will do they same thing as well. They noted Sheets doesn't, but I can't get Google Sheets to handle dates consistently at all sometimes.
This is just an excellent example of what works for a large population of users can be a bad thing for a small set of users. After this paper, I expect the error rate to drop dramatically, given how easy the fix is.
So basically (Score:2)
That means genetic scientists can't be arsed to format the relevant columns as text?
Shitty autocorrect in shitty programm ... (Score:2)
.. fucks up data handled in program. Film at eleven.
Honestly, this isn't really news. I consider quite a few pieces of contemporary software on the brink of unusable.
That also includes modern Word processors, Excel and the Facebook UI. (I use facebook sporadically with a spoof account)
A software that tries to think for me without communicating this, that tries to babysit me (remember Clippy?) is bound to be somewhere between extremely annoying and dangerous, depending on the situation you want to use it in.
Re: (Score:2)
A software that tries to think for me without communicating this, that tries to babysit me (remember Clippy?) is bound to be somewhere between extremely annoying and dangerous, depending on the situation you want to use it in.
"Hello! It seems you just imported a long list of names, but some dates and a few numbers slipped into what otherwise seems to be text." Would haven be the kind of "babysitting" that told the user that some values showed an "anomaly" (compared to the others) and would either prevent mistakes (table header slipped into data rows) or reminded the user to set the data type for a column.
One of the most annoying features in a great tool (Score:2)
I receive technical sheets of tools from clients, and if you don't know how to process them, Excel will trim leading zeros from manufacturer part numbers and abbreviate long numbers to scientific notation and will convert sizes marked in fractions into dates . . . and in general turn data into meaningless mess.
And before you jump and yell at me that Excel isn't the right tool for data management - I know . . . There's no avoiding it, everyone out there use
Software trying to be too smart (Score:5, Insightful)
Type inference in excel has wasted countless hours of my time trying to make sense of corruption caused by third parties using excel. Has gotten to the point where we actively recommend people avoid excel when handling any data they care about. I do fault excel itself because these errors are pervasive. They could have better structured the data imports or made them less creative or asked users for more feedback or have the import do a pass over the entire datasets checking for outliers that may suggest a different type.
When a critical mass is "doing it wrong" becomes pointless and counterproductive in the real world to continue to point fingers at users. Tools are supposed to be useful and if they tend not to be then that's on them.
Have no fear!!! (Score:2)
Climatologists don't use Excel.
Re: (Score:2)
My mistake! They *do* use Excel.
http://regclim.coas.oregonstat... [oregonstate.edu]
Financial 'Industry' and generally too. (Score:3)
Test-driven Excel (Score:2)
I'm a relative newbie to Excel but the first thing I learned was never to trust any of the cells where any calculations are performed.
As my spreadsheets got more and more complex I quickly realised small errors in one worksheet could manifest themselves in really ugly - but very subtle - ways. One simple-looking calculation on one worksheet could blow out an entire model if it there was even a small typo.
I suspect many people using Excel haven't learned this lesson yet. I was lucky that I noticed it myself
Auto conversion is the bane scientists (Score:3)
It's actually 19.4% (Score:3)
But they used the wrong rounding mode.
Re:Excel can kiss my 5" wide anus! (Score:4, Funny)
Re: (Score:2)
It was done with Open Office and NotePad
A minor ephiphany (Score:5, Insightful)
Without diminishing the other comments, it crossed my mind that the issue here is probably not whether Excel was used in the research. It's one of getting backup (supplementary) data into publishable form. That'd occur after the authors(s) had written their paper using their normal toolset for their work and gotten the paper through review. At this point, they are supposed to package up their data in some format dictated by the journal they are publishing in. Apparently .xls is an acceptable format -- which is not irrational. The format is documented and widely supported.
Anyway, the authors are just cleaning up and getting on with their lives -- cleaning the glassware (if any), paying any bills, archiving their data and scripts, returning borrowed equipment, etc. They are going to convert their data to .xls using whatever quick and dirty tool they can find. I doubt they are going to type tens of thousands of genome codes in manually. They'll use some tool they got from a buddy or write something themselves in Perl or Python or whatever scripting language they know. And they'll check the output to make sure that Excel loads it and that it's about the right length and that the first page or so and the last page look reasonable. And off it goes.
I don't think most folks outside of IT (and probably most in IT) are all that aware of Excel's flaky and sometimes bizarre data conversions. And, assuming that there's an unambiguous one to one translation between gene codes and excel mangled gene codes, this probably isn't a big deal. Anyone using the archived data will scratch their heads, maybe ask around, figure out what's happened, fix the data, and get on with THEIR research.
Re: (Score:3)
What's wrong with simply preformatting the Name column as text?
Re: (Score:3)
Re: (Score:2)
Re: Including this one? (Score:5, Funny)
Re: (Score:2)
From today's bash.org successor:
It says a lot about you if your Autocorrect turns Voltaire into Voltaren [wikipedia.org] or the other way round....
Re: (Score:2)
Re: (Score:2)
98% of the Excel usage I've seen in in appropriate.
But for 98% of THAT, the "appropriate" tool would have been a database.
Which usually requires a dedicated database server (we DO want to do it right after all this time), a DBMS including team for operating and maintanance, complete knowledge of the database design beforehand (in research?) and admin resources to set up the database.
So we blew out thousands of $ and haven't stored a single line of data yet. But at least we did it right.
And we still don't have a useable frontend for data entry and reporting,