Please create an account to participate in the Slashdot moderation system


Forgot your password?
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×
Math Software Programming The Almighty Buck

Why You Shouldn't Use Spreadsheets For Important Work 422

An anonymous reader writes "Computer science professor Daniel Lemire explains why spreadsheets shouldn't be used for important work, especially where dedicated software could do a better job. His post comes in response to evaluations of a new economics tome by Thomas Piketty, a book that is likely to be influential for years to come. Lemire writes, 'Unfortunately, like too many people, Piketty used spreadsheets instead of writing sane software. On the plus side, he published his code ... on the negative side, it appears that Piketty's code contains mistakes, fudging and other problems. ... Simply put, spreadsheets are good for quick and dirty work, but they are not designed for serious and reliable work. ... Spreadsheets make code review difficult. The code is hidden away in dozens if not hundreds of little cells If you are not reviewing your code carefully and if you make it difficult for others to review it, how do expect it to be reliable?'"
This discussion has been archived. No new comments can be posted.

Why You Shouldn't Use Spreadsheets For Important Work

Comments Filter:
  • by Anonymous Coward on Tuesday May 27, 2014 @07:02PM (#47103557)

    "I don't know how to use spread sheets properly."

    • by Anonymous Coward on Tuesday May 27, 2014 @07:03PM (#47103571)

      To be fair, neither to the vast majority of people who use spreadsheets for important work.

    • by Anonymous Coward on Tuesday May 27, 2014 @07:10PM (#47103629)

      Disagree. I think what he's really saying is "I've had to maintain and develop tools made by people that don't know how to use spreadsheets properly, and I'm fucking sick of it."

      • by lonecrow ( 931585 ) on Wednesday May 28, 2014 @11:36AM (#47109975)
        Spreadsheets are just a part of the Darwinism of applications. Some sharp fellow within an organization things its important to start tracking some data point or another. Maybe it gets ignored and forgotten. Other times it grows as other people see its utility and start making requests to track related data points. Eventually you get a multi-worksheet or even multi-workbook spreadsheet masquerading as an application. At some point it becomes far to hard to maintain or understand so they contract out someone like me who moves it to a relational database with a web front end. Everyone is happy!

        This work forms a major part of my work load don't fuck with it!

        Also, it is appropriate. It would be inefficient to develop a proper relational database application on the whim that some set of data points might be useful. Spreadsheets are a proving ground, and important stage in the life cycle of an application.
    • by Tyler Durden ( 136036 ) on Tuesday May 27, 2014 @07:19PM (#47103707)

      I know exactly how to use spreadsheets properly. Just don't.

      • by jythie ( 914043 ) on Tuesday May 27, 2014 @08:43PM (#47104227)
        To be fair, they have their place. They are excellent tools for creating a table of data with a chart that can be emailed to other people and read.

        Sadly, one of the big selling points for spreadsheets is their application. Pretty much any computer being used for work will have something that can read and display excel spreadsheets, you can send one to anyone and not have to worry about what they have installed. Then again you can get the same level of compatibility by outputting PDFs from matlab or something slightly saner like that....
        • by rtb61 ( 674572 )

          Spreadsheets are really easy to use properly, all you have to do is adjust your mind to the idea of creating two styles of spread sheet, the working spread sheet, well laid and and documented, to ensure the workings are understandable and checked and a linked presentation spreadsheets where the data is taken from the working spreadsheets and presented prettily of nepotistic management, so even the dumbest spawn of management can, well, at least pretend to understand.

          Other things you can do is check formu

    • by AK Marc ( 707885 )
      I thought it was getting an award for being the 10,000,000th restating of GIGO.
      • It needs restating because people forget it all the time.

        • Some things stick (Score:5, Informative)

          by TapeCutter ( 624760 ) on Tuesday May 27, 2014 @07:54PM (#47103927) Journal
          I recall a survey of (non-trivial) corporate spreadsheets in the mid-90's, it went something like 95% had a maths bug, in 80% of cases the bug made the sheet useless, 50% of the spreadsheets were used to make (incorrect) financial decisions. The reason why corporations coffers don't evaporate is that they use thousands of them so the +/-ve affect on the money buffer has a central limit of zero. It's a much more precarious situation if you using a single homespun spreadsheet to run a corner store
          • by viperidaenz ( 2515578 ) on Tuesday May 27, 2014 @07:57PM (#47103941)

            Were the survey results collated on a spreadsheet?

          • by timeOday ( 582209 ) on Tuesday May 27, 2014 @09:36PM (#47104557)
            The question is whether having the logic squirreled away in code or a DB would have made it more correct, which is a big assumption!

            I really think Piketty deserves a lot of credit for releasing his "source" spreadsheets on such a substantive and controversial work. Most authors do not. If the critiques turn out to be substantial and extensive, I plan on waiting for a second edition with corrections before investing time in reading it.

    • by Anonymous Coward on Tuesday May 27, 2014 @08:25PM (#47104107)

      Most people have no idea how to use a relational database.

      • by Ol Olsoc ( 1175323 ) on Tuesday May 27, 2014 @09:48PM (#47104639)
        Just got back from a meeting where I was explaining relational databases. No one really got it, but the Excel expert thought I was full of shit.

        Well, I sort of am, but not for that reason.

        There is nothing worse than trying to get a spreadsheet person up and running on relational databases. They argue with you about every point, then they freak.

        • by meglon ( 1001833 ) on Wednesday May 28, 2014 @03:04AM (#47105955)

          There is nothing worse than trying to get a spreadsheet person up and running on relational databases. They argue with you about every point, then they freak.

          Sure there is... there's trying to get a db user to understand spreadsheets. How many times have i told you, the right tool for the right job (watch out for that low hanging pipe! ..... nevermind....).

    • by jd2112 ( 1535857 ) on Tuesday May 27, 2014 @08:36PM (#47104171)

      "I don't know how to use spread sheets properly."

      Or, I realize that just because I have a hammer not all problems are nails.

    • by jythie ( 914043 ) on Tuesday May 27, 2014 @08:40PM (#47104201)
      Eh, I think it can be legitimately argued that spreadsheets are a bad place to do complex things. Even people who are skilled at setting them up produce work that is difficult to examine and track. In many ways it is a technology that it still stuck in the 80s, even though they keep throwing in more and more complex functionality, but the method of storing and organizing the logic is dated in a bad (rather then proven) way.

      Even teaching students matlab would probably be an improvement, but excel is what they default to teaching anyone outside math and CS, building all the coursework around it.
    • by Giant Electronic Bra ( 1229876 ) on Tuesday May 27, 2014 @08:41PM (#47104209)

      No kidding. Also, it MAY not be that easy to review the code in a spreadsheet, but it is VERY VERY EASY to test it. If you want reliable spreadsheets its PERFECTLY possible to test them to the Nth degree, far more so than with most other code. You have a place to put the tests, and a place to put the expected results, its all rather devilishly simple actually. For that matter you can document the bejeezus out of them too.

      I think spreadsheets are like any sort of simple interpreted language. Idiots can easily blow their left foot off. Real software engineers can also do some very cool stuff. Most of the perl code I've seen is ugly as all hell and pretty worthless, but MY perl code is a thing of beauty that people maintain for years. Its all in how you use the tool.

    • by CriminalNerd ( 882826 ) on Tuesday May 27, 2014 @08:48PM (#47104247)

      "I never worked in a company with normal people."

      I'm guessing you haven't had the pleasure of working in the typical firm where the company's years-old ENTIRE lifetime of work and data is passed around e-mail as a 80MB Excel attachment.

    • by Coeurderoy ( 717228 ) on Wednesday May 28, 2014 @12:47AM (#47105383)

      No, what he is saying is that it is easy to "write" sloppy code for excel and hard to write good code.
      And even harder to review it.

      It's similar to the reason a) people moved away from basic, and b) basic evolved to be (duck, please no flame) almost usable (I still do not like it, but recognize that it is possible to write usable code in visual basic).

      If you want to criticizes him, picking on Piketty is VERY political, "excel" errors are galore in neocon publications, but of course the FT did not find anything not to love there, but saying that just maybe having a small group of people siphoning off all the cash from society is not sustainable for ever does make them nervous and very desirous to find some scab to pick at...

      Nevertheless he is right, it would be very good if decision makers would be able to "read the numbers" and not just "massage the numbers".
      Something like R or ADaMSoft would drive you to test ideas on datasets and learn from them whereas excel (or calc :)) have a tendency to get you to fiddle the numbers until the taxman aherm the reader sees what you would like them to see...

    • Sure. But the same is true about GOTOs. You CAN write reliable code using GOTOs, if you have self discipline. But in practice 99% of code with GOTOs is abused. Hence the idiom "GOTO is evil".

      Spread sheet is evil.
  • by BeerCat ( 685972 ) on Tuesday May 27, 2014 @07:06PM (#47103593) Homepage

    Spreadsheets are like a blank piece of paper with grid squares. Which means you can put anything down, tied together with some formulae, and it's brilliant.

    Which is also why it's complete pants - the "anything goes" really does mean that.

    (That, and it will tend to break when you most rely on it)

    • by plover ( 150551 ) on Tuesday May 27, 2014 @07:36PM (#47103835) Homepage Journal

      What people fail to realize is that spreadsheets are like any other form of programming, and therefore should be treated as such. Write tests. Break complex formulas down into named cells. Use references to carry concepts. Beware of globals. Keep small concepts small, simple, and modular. Write more tests.

      Does anybody do that with every spreadsheet they write? Doubtful. I know I only go to all that trouble myself when I have a boatload of inputs that have to get put together. I usually discover about part way in that the sheet is going to be complex enough to need tests. When I do, it's time to start refactoring it, and these are my general steps:

      1. Give cells and ranges meaningful names
      2. Break complex formulas down to several small formulas
      3. Add tests for the formulas
      4. Factor out duplicates

      Of all of these, giving cells and ranges names is the most important, because it makes the sheets readable. I can then usually understand the results well enough to know if my formulas are working, but a complex formula often needs an independent set of tests to prove the discontinuities in the functions are actually where I think they should be.

  • by Rinikusu ( 28164 ) on Tuesday May 27, 2014 @07:09PM (#47103615)

    Dunno if that's a good or bad thing, though.

    I've had to take over maintenance of a few "excel" based applications. Never. Again.

    • by mjwx ( 966435 )

      Dunno if that's a good or bad thing, though.

      I've had to take over maintenance of a few "excel" based applications. Never. Again.

      That's Excel for you.

      I use a lot of scripts that are based on CSV files for input, output and storage of values. You want to know what I edit them in... Notepad. Because Excel fucks around with it too much and I'm sick of the "but this is not in our proprietary format" dialogue when closing it (it also refuses to save on exit unless I change it to .xlsx). However the biggest sin Excel does (to me) is removing leading zeros, that number has to fix a N digit mask or it will fail.

      Excel has grown into a t

  • by Cyberax ( 705495 ) on Tuesday May 27, 2014 @07:10PM (#47103631)
    So what's the alternative? There are no good and easy to use software packages to create simple data-intensive apps. The closest alternative was VB6 and if I had to chose between it and Excel, I'd choose Excel any day of the week.
    • File Maker Pro.

      (trying not to laugh)

    • by simonbp ( 412489 )

      Python plus Numpy. Plus Pandas if working with large amounts of data.

      • For those things that a spreadsheet does quickly and well, you could waste hours screwing around with Numpy

      • by Cyberax ( 705495 )
        Nope. Numpy doesn't allow you to visually play with the data. You have to write code for everything.
        • by mbkennel ( 97636 )

          This problem, reproducible data analysis, has been solved before.

          Decent alternatives to spreadsheets (which are entirely opaque) are (a) Matlab, (b) Mathematica notebook, (c) iPython notebook+numpy+pandas, (d) SAS/SPSS/R
          • by Cyberax ( 705495 )
            Excel is reproducible. Microsoft worked very hard to make its floating point calculations work exactly the same way on all machines.

            SPSS is nice, but it is expensive as hell.
        • by sjames ( 1099 )

          That's what the GNUplot module is for.

    • by geekoid ( 135745 )

      Fortran. If you laugh, then you don't know much about advanced computing.

    • I know it's huge overkill, but I've had times where it was honestly easier to drop the data into PostgreSQL (MySQL, if you prefer) than edit it in Excel / Gnumeric / Open/LibreOffice's spreadsheet tool.

      There was one case where my friend needed to analyze a modest amount of data -- 70k rows, 30 columns or so -- and Excel would absolutely choke on her new laptop running Excel. Dropped it into Postgres on my anemic netbook and queries were lightning fast. No need to specify column types, either -- just load
    • Access.

      People will laugh. But in an office environment it's an excellent solution. But one can still write formulas directly in reports and forms, so code review isn't necessarily easier.

      • by jd2112 ( 1535857 )


        People will laugh. But in an office environment it's an excellent solution. But one can still write formulas directly in reports and forms, so code review isn't necessarily easier.

        For those who don't understand relational database concepts, Access can be a machine gun for shooting yourself in the foot. The types of errors that typically find their way into Excel spreadsheets can get magnified several times over by moving to Access.
        Those who do understand relational database concepts are probably putting their data in a real DBMS (MSSQL, Oracle, Postgres, MySQL, etc).

    • Perl. When things get to messy for a spreadsheet, I whip up a little perl. Easier to repeat the calculations for different data sets, as a bonus. Access to much richer libraries, and you can shell out to GNUPlot, Ploticus, Asymptote, or whatever. Or Python, if that's your cup of tea ... or Ruby, even R ... whatever scripting language floats your boat.
  • Of course it is, but we can't afford to do it right.
  • anger! always with the nomenclature distinctions...this is a stupid approach to a real problem

    a spreadsheet is a computer program

    that's it...

    to criticize the act of entering data and performing computations on that data using computer software is the height of ignorance

    I don't know if he's right or not, but this guy's real criticizm, once you fight through his ignorance of the issue is that in his view Pickety didn't show enough of how he got his figures...or more accurately, the TFA author had to

    • Maybe you should read it again?
      His real criticizm is that spreadsheet software is horrible for any high end work, or with anything you want to share, and he is correct.

      "so he probably doesn't know how to use the interface of a spreadsheet very well, which makes the act of checking a formula tedious..."
      it is tedious, even if you are an expert and even if the user uses goof practices.

      "P-hacking is the problem in social science/economics research, not using 'spreadsheets'"
      I don't think you know what P-Hacking is.

      • His real criticizm is that spreadsheet software is horrible for any high end work, or with anything you want to share, and he is correct.

        you're wrong on both counts...that is not his 'real' criticism and even if it was he and you would still be wrong

        spreadsheets are ***computation software***

        if it can execute the operation needed for the research then it is acceptable...if not, then no


        it's a tool to analyze data...that's ****all any of these programs are, ever****

        the method of analysis is either proper

        • by tepples ( 727027 )

          if it can execute the operation needed for the research then it is acceptable...if not, then no

          I think geekoid is trying to say that even though spreadsheets can in theory "execute the operation needed for the research", practical limits inherent in the spreadsheet user interface make it difficult to verify that what the spreadsheet is calculating matches what you wanted to calculate. Consider this: An 8-bit microcomputer "can execute the operation needed for the research" but that doesn't make it the best tool.

          • Consider this: An 8-bit microcomputer "can execute the operation needed for the research" but that doesn't make it the best tool.

            thanks for the input but this is still the wrong analogy...

            it is not what TFA is saying, and it is incorrect in fact

            Picketty is being criticized in TFA because he used a spreadsheet, which has 'cells' which contain 'formulas' which are descriptions of mathematical operations on data

            TFA author is saying that, I quote again:

            The code is hidden away in dozens if not hundreds of little

        • by dbIII ( 701233 )

          spreadsheets are ***computation software***

          So when did Slashot turn into a "the beige box is a hard drive because I say it is and fuck you elitist technical folk" site?

        • by Baloroth ( 2370816 ) on Tuesday May 27, 2014 @09:28PM (#47104507)

          if it can execute the operation needed for the research then it is acceptable...if not, then no

          You could probably write this computational code in a shell script, too. But it would still be a terrible idea. Why? Because it's the wrong tool for the job. Simple as that. It doesn't matter what you can and cannot do, it matters what you should do, and you shouldn't use spreadsheets for anything complicated. It's simply too easy to make stupid mistakes that are difficult to trace and correct (or even notice).

          you can't blame a spreadsheet for a poorly devised *can* blame a researcher for using an inappropriate statistical *cannot* criticize the method of analysis as long as it is physically capable of the computation

          TFA isn't blaming the spreadsheets, he's blaming the people who use them for using them. It's not acceptable to use a tool that works poorly and is highly susceptible to mistakes, and no one should listen to anyone who does so unless that person is damned good at that tool: yes, it is possible that someone is so fantastically good with spreadsheets they can use them for massive data analysis with no problems. They are, however, the exception, and I would generally be inclined to disbelieve the results from anyone who does large work with spreadsheets (simply because of the possibility for errors and the lack of concern for accuracy that using spreadsheets demonstrates). So, the conclusion is that you shouldn't use spreadsheets for important work. You absolutely can criticize an analysis if it uses a tool that is highly likely to introduce errors, and that's fundamentally the point (and it's underscored by the fact that that is precisely what happened in Piketty’s case).

    • I agree, a well made spreadsheet is far easier to follow than a proprietary program or even most study's results.

      If you have a custom formula in a spreadsheet, create it in the program's scripting language instead of copy/pasting to tons of cells. Create the spreadsheet in a repeatable layout that is ease to understand the sections and the flow of the data.

      I do not see how that is any different than using a proprietary program. At least with a spreadsheet you can look directly at the code for errors. In
  • by LordLucless ( 582312 ) on Tuesday May 27, 2014 @07:19PM (#47103703)

    It's not "spreadsheets shouldn't be used for important work", it's "spreadsheets should not be used for work that's not suitable for spreadsheets". Tools for the job, and all that.

    • by tepples ( 727027 )
      Or perhaps it's "very little work happens to be both important and suitable for spreadsheets."
  • by muhula ( 621678 ) on Tuesday May 27, 2014 @07:20PM (#47103711)
    If the inability to code review spreadsheets was a real issue, it wouldn't be too hard to convert spreadsheet functions into a functional language. For non-programmers, a spreadsheet lowers the barrier to entry. This allows people to do something useful and productive who couldn't do so otherwise. That's a good thing.
  • by Jonathan Mann ( 3481921 ) on Tuesday May 27, 2014 @07:23PM (#47103739)
    Another major issues with spreadsheets is that they don't handle data typing issues very well. For example, if you try to add a list of numbers, and somewhere in the list you have a number encoded as text, instead of throwing an error, it won't be included in the sum. Errors should never pass silently. Unless explicitly silenced.
  • by Virtucon ( 127420 ) on Tuesday May 27, 2014 @07:23PM (#47103741)

    You're doing it wrong.

  • by matbury ( 3458347 ) on Tuesday May 27, 2014 @07:31PM (#47103801) Homepage

    The fact that Piketty's work describes a damning indictement of the USA's most cherished concept - free market capitalism - means that thousands of neo-liberal economists will pour over every single digit and operator in his spreadsheets looking for anything to negate the findings. If they can't find anything, they'll attack him. When you hear of character attacks against Piketty or some other diversionary tactic, you'll know his data is correct.

    • by tomhath ( 637240 )
      Other economics papers that reached similar conclusions such as the well known Growth in a Time of Debt [] also were based on flawed spreadsheets. It makes one question the entire hypothesis when the best known works on the subject are based on incorrect (or just plain fabricated) data.
  • by swm ( 171547 ) * <> on Tuesday May 27, 2014 @07:43PM (#47103877) Homepage

    I figured this out twenty-mumble years ago.
    I was doing data analysis in spreadsheets, and realized that I had no way to audit them.
    The data and the analysis were all the spreadsheet.

    As soon as I got a grip on my data, I changed over to C programs that I could test, and document, and validate, and run at any time to demonstrate that input X generated output Y.

  • by Diddlbiker ( 1022703 ) on Tuesday May 27, 2014 @07:45PM (#47103881)
    My father was a wise man, and a solid programmer. He liked Basic, because it was simple, and readable (in his environment the alternatives were mainly Assembler, Cobol, and RPG). Whenever people made fun of his love for Basic, and how it resulted in bad code, he always replied “there are no bad languages, just bad programmers.

    The problem isn't the spreadsheet. The problem is people building ugly models in it. Do they seriously think that if those models were written in C, Java or Perl they would have been magnitudes better? I doubt it; you're just transplanting bad habits onto a different platform.

    Of course, if he'd used trained professionals to build his models in whatever language of choice the models would be better. If he'd used trained professionals to build his spreadsheet models they would have been better as well.
    • by Luckyo ( 1726890 )

      The "it's not the tool, it's the people" argument has one major flaw.

      Tools are built so that people can perform tasks they can't otherwise do. As a result, if tool fails because it's not good enough for the task, at least part of the blame lies with tool and its creator.

    • there are no bad languages, just bad programmers.

      There are, however, languages that make it far easier to write code that is less readable and harder to maintain. As a specific example, compare Fortran 77 with Fortran 90. I can write the latter without any need for numerical statement labels. I can write a straightforward "DO WHILE" loop in Fortran 90, while in Fortran 77, I'd have to use the dreaded GOTO to get the same effect. Aside from basic stuff like that, I can write formulas in Fortran 90 with whole

  • I think the title should be "Why You Shouldn't Use Spreadsheets for *Complicated* Work". Just because a job is important doesn't mean the calculation is complex and something that needs to be coded in, for example, matlab.

    If my job is to make a pie chart, I can't see why using Excel is a bad idea. On the other hand, if I am examining the variance of several thousand data points and then plotting the residuals from a gaussian fit, then yes, I can see why using something else would be a lot better. It has not

  • by turp182 ( 1020263 ) on Tuesday May 27, 2014 @08:15PM (#47104051) Journal

    There are no corporate secrets below, but I stumbled upon this formula in an actuarial spreadsheet (I'm a developer with an actuarial education).

    The only way this logic could be verified is by breaking the single formula into 20+ different cells with more simple calculations.

    And of course it is in several thousand cells, bringing any computer at all to its knees during calculation.

    A good example of how not to use Excel (but the actuaries don't have access to IT prototyping or core development).


  • by MacTO ( 1161105 ) on Tuesday May 27, 2014 @08:24PM (#47104097)

    Lemire is right, spreadsheets are terrible for complex models that need to be modified. He is right for precisely the reasons he outlined.

    That doesn't mean that spreadsheets are useless. If you have a standard form where you're only modifying values, rather than functions, spreadsheets are great. There is a low barrier to entry and they are good for communicating results. But as soon as you need to audit or modify functions, you are jumping all over the place and it is easy to make mistakes. Yes, there are ways to consolidate your code (at least in spreadsheets that support scripting), but you are going to take so much time learning how to use the advanced features of you spreadsheet that may as well learn a dedicated programming language in those cases.

    And the reality is that it's pretty easy to learn how to use programming languages these days. Not as easy as using a spreadsheet, to be sure, but even the standard Python distribution can handle most of the vulgarities of loading data into memory and storing it properly (i.e. you don't have to worry about parsing or data structures too much). By adding the appropriate modules you can do some decent visualization of data. In some cases the visualization will be better than spreadsheets, and in others spreadsheets will have the lead. And that's just Python, which I chose as an example because I'm familiar with it. The reality is that there are much more appropriate domain specific languages out there.

The difficult we do today; the impossible takes a little longer.