Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Python Science

Python Code Glitch May Have Caused Errors In Over 100 Published Studies (vice.com) 121

Over 100 published studies may have incorrect results thanks to a glitchy piece of Python code discovered by researchers at the University of Hawaii.

An anonymous reader quotes Motherboard: The glitch caused results of a common chemistry computation to vary depending on the operating system used, causing discrepancies among Mac, Windows, and Linux systems. The researchers published the revelation and a debugged version of the script, which amounts to roughly 1,000 lines of code, on Tuesday in the journal Organic Letters.

"This simple glitch in the original script calls into question the conclusions of a significant number of papers on a wide range of topics in a way that cannot be easily resolved from published information because the operating system is rarely mentioned," the new paper reads. "Authors who used these scripts should certainly double-check their results and any relevant conclusions using the modified scripts in the [supplementary information]."

Yuheng Luo, a graduate student at the University of Hawaii at Manoa, discovered the glitch this summer when he was verifying the results of research conducted by chemistry professor Philip Williams on cyanobacteria... Under supervision of University of Hawaii at Manoa assistant chemistry professor Rui Sun, Luo used a script written in Python that was published as part of a 2014 paper by Patrick Willoughby, Matthew Jansma, and Thomas Hoye in the journal Nature Protocols . The code computes chemical shift values for NMR, or nuclear magnetic resonance spectroscopy, a common technique used by chemists to determine the molecular make-up of a sample. Luo's results did not match up with the NMR values that Williams' group had previously calculated, and according to Sun, when his students ran the code on their computers, they realized that different operating systems were producing different results.

Sun then adjusted the code to fix the glitch, which had to do with how different operating systems sort files.

The researcher who wrote the flawed script told Motherboard that the new study was "a beautiful example of science working to advance the work we reported in 2014. They did a tremendous service to the community in figuring this out."

Sun described the original authors as "very gracious," saying they encouraged the publication of the findings.
This discussion has been archived. No new comments can be posted.

Python Code Glitch May Have Caused Errors In Over 100 Published Studies

Comments Filter:
  • Luo’s results did not match up with the NMR values that Williams’ group had previously calculated, and according to Sun, when his students ran the code on their computers, they realized that different operating systems were producing different results. Sun then adjusted the code to fix the glitch, which had to do with how different operating systems sort files.

    For example, if the code led Williams to wrongly identify the contents of his sample, chemists trying to recreate the molecule to test as a potential cancer drug would be chasing after the wrong compound, Williams said.

    • Thanks! The summary and article are crap.

      That's from this tweet [twitter.com]

      Holy crap. Huge bug uncovered in computational chemistry software because different operating systems sort files differently and the published scripts don't handle it well. If you do or rely on calculated NMR chemical shifts, this is a must-read.

      I'm still not sure what the ~1,000 lines of code was?

      The pdf [acs.org] is behind a paywall.

      • Looks like this pdf [acs.org] (Characterization of Leptazolines A-D, Polar Oxazolines from the Cyanobacterium Leptolyngbya sp., Reveals a Glitch with the âoeWilloughby-Hoyeâ Scripts for Calculating NMR Chemical Shifts) is available. I don't see the python script there other then this blurb:

        The new scripts were tested using Python V3 (Windows 10 using Anaconda 1.9.6
        running Python 3.7.1; Python 3.6.5 on Mac Mavericks; Python 3.7.1 on Mac Mojave)
        and Python V2 (Python 2.7.10 on Mac El Capitan 10.11.6; Python 2.

        • by weilawei ( 897823 ) on Saturday October 12, 2019 @02:53PM (#59300400)

          Anyone taking bets it was on case sensitivity?

          • One thing is clear...

            Scientists are shitty at statistics and at coding.

            • Everybody is shitty at statistics except mathematicians and actuarial accountants.

              The problem is, mathematicians are shitty at anything concrete, because details, and actuarial accountants are already too narrowly specialized, and highly paid, to do anything else.

            • Said a scientist?
              That dissing aside, a large majority of paid coders write bollocks code most of the time.

          • Sounds plausable... seems like they tried to read in order while the sequence of files was not similar.

          • "Sun then adjusted the code to fix the glitch, which had to do with how different operating systems sort files."

            I took that to mean differences in how order of directory contents are returned. Case sensitivity would be unlikely to surprise even python developers, but I bet people are regularly surprised to find the order of files in a directory are unspecified and not what you want to think they'd be.

            • by gweihir ( 88907 )

              ... but I bet people are regularly surprised to find the order of files in a directory are unspecified and not what you want to think they'd be.

              Me to. The problem is that most coders have no clue how a filesystem actually works and, to make matters worse, are completely unaware of that shortcoming. Dunning-Kruger for coders.

              • by Aighearach ( 97333 ) on Saturday October 12, 2019 @06:27PM (#59301190)

                They start out completely unaware of how a filesystem actually works, and so they don't make any assumptions, and all is well.

                It isn't until they hit Mount Stupid and start thinking they really do understand filesystems that these mistakes happen.

                IME even if you read all manuals, and consult the manuals while doing something with files, the filesystem will still find a way to ignore you and place the files in an arbitrary order in the directory. The files have an order. And the filesystems have various ways of ordering the files. But the file utilities, both stand alone and in language libraries, tend to make claims or implications about the order that they can't actually enforce.

                The only thing that works is to stick to complete, Socrates-style ignorance; know only that filesystems write directory entries however they want, and that to be ordered, you have to order them yourself. That includes not only ignoring any ideas you have about how the filesystem orders things, but also refusing to believe that any library will, by default, order things they way you want. Because they will seem to be doing what they said. For a long time.

                I used to have an mp3 player that could only play files in directory order. Without using a special directory-editing program to fix them, there was just no way to choose an order. Even copying single files in order didn't work without running a sync after every copy.

            • by Anonymous Coward

              There's more to file ordering than case sensitivity. Collation encompasses not just case sensitivity, but how international characters sort versus their US-ASCII equivalents, where Kana characters fit into things, whether numeric names are sorted as numbers or plain old strings, etc..

              Since most operating systems support multiple concurrent (and different) file systems any algorithm that relies on "sort ordering" of file names is doomed to fail not just on different operating systems but on different file sy

            • "Sun then adjusted the code to fix the glitch, which had to do with how different operating systems sort files."

              I took that to mean differences in how order of directory contents are returned. Case sensitivity would be unlikely to surprise even python developers, but I bet people are regularly surprised to find the order of files in a directory are unspecified and not what you want to think they'd be.

              I've lived that dream. I deal with a large number of large files of measurement data from in-development silicon. They all need statistical analysis of various types. The end result of employing various schemes is that all the necessary context information (voltage, temp, chip-ID, iteration, etc) gets encoded in the filename and the scripts parse the filenames and build tables so you can process files by your choice of parameter and choice of order.

              Then output information can put the filename against the da

          • by PPH ( 736903 ) on Saturday October 12, 2019 @05:02PM (#59300928)

            Getting ready to blame MICROS~1 again.

          • More generally I would bet that this was a locale problem. Changing the locale will affect the sort order of pretty much all characters.

            For example, on Linux, the UTF8 locales are ignoring white spaces while the C locale does not.

            (bash) export LANG=en_US.utf8
            (bash) printf "X 1\nX2\nX 3" | sort
            X 1
            X2
            X 3

            (bash) export LANG=C
            (bash) printf "X 1\nX2\nX 3" | sort
            X 1
            X 3
            X2

            In servers, the locale is usually set to C but on desktop machines it is usually equal set to en_US.utf8 (or whatever your preferred language c

          • Remind me not to bet against you. =P

            That would make the most sense.

          • You'd lose that bet. It was glob.glob(...) returning files in a non-reproducible way.

      • by sjames ( 1099 )

        That's the importance of reading docs carefully. The documentation for os.listdir says:

        The list is in arbitrary order, ...

        In other words, no promises about order.

      • The REAL problem.. (Score:5, Insightful)

        by thesupraman ( 179040 ) on Sunday October 13, 2019 @04:11AM (#59302224)

        Actually, everyone is barking up the wrong tree I suspect.

        The REAL problem is their computations are not numerically stable!

        The order of file processing should not damn matter, and if it DOES, then something is wrong in their system.
        Any instability should be known, allowed for, and reported as a margin of error in the results.
        Obviously it was not, so the whole system is bogus.

        To force a certain order of files is NOT a fix. it is a hack to get the consistently same randomly incorrect result.

    • Thanks for pointing this one out. I find it odd that you can code algorithms that depend on the order of the files you read. Now that I know, I will try to avoid that.
    • by gweihir ( 88907 )

      That is not a "glitch" in Python. That is the effect of incompetent coders that do not have the first clue about how systems work and that they need to enforce any state they depend on by themselves.

  • This resembles the bug in the early Pentiums from Intel that showed up in statistics textbooks of 1999. It was an error in the lookup tables for 11/17ths that made some graphs that should flow gracefully show a bump.

    • Try looking at the recommended computation for the APT test cutoff values vs false probability rate in SP800-90B.

      They recommend using excel - Hence IEEE 754. Hence 53 bits of precision. This rapidly falls apart when you want false positive (I.E. declaring a fail over good data) probabilities of less than 1 in 2 to the 64.

      Then looking back into the tables in the spec, you can see that the figures are indeed wrong. But wrong in a different way. It took me forever to work it out - but it's this: The binomal qu

    • No, this is very different. Slashdot sensationalized the title. The problem was between chair and monitor. They used glob.glob() in their algorithms and assumed the result is sorted, when documentation clearly states the output depends on OS and there are no such guarantees.
  • "Python code" (Score:5, Insightful)

    by Superdarion ( 1286310 ) on Saturday October 12, 2019 @03:37PM (#59300580)

    The summary and the article are ridiculous. The real scoop is simple: some scientific programmer assumed that some file-listing/loading function (os.listdir, maybe?) returned always the same order of files, which it does not. It is not a bug in Python at all, as it is well described in the documentation. It was a stupid assumption made by some PhD student, which in science almost always have terrible programming habits and very superficial knowledge of the language they're programming in.

    Truth is, when a group publishes their code with a paper, it is rarely used by anyone else, unless it's published and maintained as a toolbox. Taking the code off a scientific paper, which was used for exactly that one paper and nothing else, and using it with blind faith is never a good idea. This code is not audited or tested in any meaningful way. So why publish it? As the authors of the original paper and the paper that published the correction very directly stated, that's how science is supposed to work. Publishing your code is now done for the same reason that one publishes all the details of the experiment and all the math behind the models: so that the scientific community can evaluate the work and see it if holds up.

    • that's how science is supposed to work. Publishing your code is now done for the same reason that one publishes all the details of the experiment and all the math behind the models: so that the scientific community can evaluate the work and see it if holds up.

      Exactly. Open source software in general was modelled after the (centuries old) scientific method. If you use closed software in your publication, it's not really science, because nobody can check how your method works. Or as your math teacher always said, it's zero points if you don't show your work.

      I can't access the full paper now, and I'm stuck wondering how on Earth could a method like this rely on filename ordering...

      • If you use closed software in your publication, it's not really science, because nobody can check how your method works.

        I disagree somewhat: describing what you think the code does should be sufficient for most papers. All you need is sufficient detail for someone else to be able to reproduce what you think you did in your analysis. Forcing someone else to write their own code, instead of using yours, makes it easier to catch errors since they are unlikely to make the same mistakes that you did. The case in point is a good example of this.

        • by Uecker ( 1842596 )

          There are two problem with that:

          - Once things become more complicated, it is not feasible to rewrite everything every time. So only established labs who can build on their own software could reproduce more complicated methods which would be a problem in my opinion.
          - It is the reviewer's job to make sure that the methods are described in sufficient detail. But again, this is almost impossible to achieve for sophisticated methods and small details are often overlooked. If the source is available, the code can

      • by g01d4 ( 888748 )

        ...how on Earth could a method like this rely on filename ordering...

        Exactly. And if filename ordering was critical then why weren't there anomalous results that would have set off suspicion? Or were the authors just lucky in making an assumption that was matched by their OS?

    • Re: (Score:3, Insightful)

      by drinkypoo ( 153816 )

      The summary and the article are ridiculous. The real scoop is simple: some scientific programmer assumed that some file-listing/loading function (os.listdir, maybe?) returned always the same order of files, which it does not.

      It probably should, though. The primary use case for Python is to serve the people who are confused by semicolons. Why wouldn't they be confused about this, too?

      • by dargaud ( 518470 )
        Ha, yes, Python, the programming language where invisible characters are fundamental to what your program will do...
    • It is not a bug in Python at all, as it is well described in the documentation.

      It may be documented, but when all Python is shit code you kind of have to blame the language. How it's so popular in STEM I have no idea (oh wait, they teach that and Java to non-devs who need to code occasionally and are too lazy to learn non-shit languages on their own.)

      • by Jeremi ( 14640 )

        It may be documented, but when all Python is shit code you kind of have to blame the language.

        The thing is, any language used by lots of non-programmers is going to end up full of shit code, because that is the kind of code non-programmers write.

        I'm sure the average code-quality of e.g. Haskell or Erlang is significantly higher, but that would be because only really good programmers ever try to use those languages in the first place. Joey in the chem lab, OTOH, is going to use whatever is easy enough for him to get his chemistry project done, and these days that's likely to mean Python.

    • by gweihir ( 88907 )

      Indeed. Coder incompetence, plain and simple, nothing else.

  • For those interested, the fixed python script is available here: (and it's just 387 lines):
    https://pubs.acs.org/doi/suppl... [acs.org]

    • by pieleric ( 917714 ) on Saturday October 12, 2019 @04:13PM (#59300736) Homepage

      I couldn't find the original script, but looking at the corrected version, it's fairly obvious what happened.

      The script used many pairs of files to "compute something important". The files are named along the pattern nmr-xxx.out and freq-xxx.out.
      The authors used the standard Python function glob("*.out") to list all the filenames. They assumed that the first nmr- filename matched with the first freq- filename. However, contrary to the bash command "ls *.out", glob() doesn't sort the output (it essentially returns the order the files are stored on the hard disk, which tends to often be in the same order as they were stored but no promises are made).
      So pairs of files where "randomly" made, which lead to incorrect computations/results... sometimes

      In defence of Python, this behaviour of glob() is warned in the first line of the documentation.

      • by gweihir ( 88907 )

        In defence of Python, this behaviour of glob() is warned in the first line of the documentation.

        And in addition, it is an entirely sane default. Sorting takes extra effort and worse than linear in addition. It is quite rightfully left to the coder to arrange the files in any order they need or use them as presented and avoiding that O(n log(n)) additional effort. Now, "ls" is a different matter. It is completely correctly trying for some user friendliness at the expense of speed per default.

  • by tdelaney ( 458893 ) on Saturday October 12, 2019 @03:53PM (#59300632)

    https://pubs.acs.org/doi/10.10... [acs.org]
    https://pubs.acs.org/doi/suppl... [acs.org]

    Note: the zip archives are titled incorrectly - the one with the "raw data" is actually the python script.

    From the README_text.txt:

    Corrections/Modification to the Original Script: The key correction to the original scripts is the inclusion of the "list_of_files.sort()" that sorts the files prior to calculating the Boltzmann averages.

    def read_gaussian_outputfiles():
            list_of_files = []
            for file in glob.glob('*.out'):
                    list_of_files.append(file)
            list_of_files.sort()
            return list_of_files

    (although the last line is actually indented incorrectly in the readme - I've checked the source and it's as it should be).

    So the original script was relying on the order of files returned from glob.glob(), which is not and has never been defined by Python.

    https://docs.python.org/2/libr... [python.org]
    https://docs.python.org/3/libr... [python.org]

    The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.

    (emphasis mine).

    So we're not just talking about different results on different operating systems, but potentially on every single machine or even every single run depending on filesystem implementation e.g. does it naturally return files in lexicographic order, creation timestamp order, some arbitrary but consistent order, or a totally random order (I don't know of a filesystem that does this, but it would be a good fuzzing technique).

    • by Anonymous Coward

      Jesus Christ.

      Why not just "return glob.glob('*.out').sort()"

    • Actually, the readme is wrong - the whole function is:

      def read_gaussian_outputfiles():
      list_of_files = []
      for file in glob.glob('*.out'):
      list_of_files.append(file)
      if (len(list_of_files) == 0):
      for file in glob.glob('*.log'):
      list_of_files.appe

      • Little benefit is gained for speed, yet confusion increases for all but expert programmers. There is nothing wrong with more explicit logical statements.

        Also, comments would help. What is the world like such that, if no .out files exist, use .log instead. And if "if not (list object)" is an obvious feature, and better than len, comment that too. I find it hard to believe len wouldn't check the same internal variable "not" checks.

        • Python truth values are so fundamental that it's expected that anyone using python should understand how they work. It is idiomatic to use:

          if sequence:

          rather than:

          if len(sequence) > 0:

          Python's short-circuiting behaviour however is less well known, and is now actively discouraged in favour of the ternary expression and other constructs.

  • Alternative title: Chemists are bad at programming, news at 11.

  • by l2718 ( 514756 ) on Saturday October 12, 2019 @08:28PM (#59301422)
    The last sentence in the paper reads:

    Ultimately, this example serves as a reminder of the principle Caveat emptor and that users should validate noncommercial software on their system prior to use on new applications.

    In other words, even authors who benefit from freely released source code seem to have unflagging faith in "commercial software". In fact, the only reasons these authors were able to diagonse and fix the bug was that the code was freely available.

    Instead, scientists should verify all software that they use, commercial or not.

  • Kudos to the original authors for gracious acknowledgement of the correction.

    Might have been a good idea to have a software expert audit code published in scientific papers.

As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality. -- Albert Einstein

Working...