Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Science

Human Genome Contaminated With Mycoplasma DNA 123

KentuckyFC writes "The published human genome is contaminated with DNA sequences from mycoplasma bacteria, according to bioinformatics researchers who blame an epidemic of mycoplasma contamination in molecular biology labs around the world. The researchers say they've also found mycoplasma DNA in two commercially available human DNA chips made by biotech companies for measuring levels of human gene expression. So anybody using these chips to measure human gene expression is also unknowingly measuring mycoplasma gene expression too. The mycoplasma genes are clearly successful in reproducing themselves in silico raising the possibility that we're seeing the beginnings of an entirely new kind of landscape of infection. One option to combat this kind of virtual infection is to protect databases with the genomic version of antivirus software, a kind of virtual immune system. But this in itself could make things worse by triggering an evolutionary arms race that selects genes most capable of beating the safeguards."
This discussion has been archived. No new comments can be posted.

Human Genome Contaminated With Mycoplasma DNA

Comments Filter:
  • by Anonymous Coward

    ...clearly uses DRM.

  • Haven't heard that one before...

  • would they wind up with Swamp Thing?
  • Comment removed based on user account deletion
  • Data vs executable (Score:5, Insightful)

    by Dan East ( 318230 ) on Thursday June 23, 2011 @02:36PM (#36545842) Journal

    But this in itself could make things worse by triggering an evolutionary arms race that selects genes most capable of beating the safeguards.

    Why is the word "evolutionary" used here? We're talking about static data that is not "executed" - it does not reproduce, it is only copied verbatim. Invalid data that bypasses filters ("antivirus software") is simply that - corrupt, invalid data that does not belong, but at least there will be less of it after filtering. That doesn't make the data somehow more powerful or adaptive - the filter merely missed it. The key fact is the data does not get to modify itself in an iterative fashion in order to survive or improve.

    • by x6060 ( 672364 )
      Its not that easy though. Bacteria, viruses, and your DNA will accumulate mutations. It can happen on a cell by cell level. Thats some of the cause for some cancers.
    • by Anonymous Coward

      This was my first thought when reading the summary - there is no source of feedback to make the data filtering mechanisms loop into the evolutionary design of the bacteria involved unless - and this incredibly out there - everyone starts applying methods to kill the bacteria only based on the amount of corrupt data they scan in - then *maybe* it would be possible for those bacteria with lesser differences to make their way into the system while the rest die out, but that is incredibly unlikely given their o

    • Re: (Score:3, Informative)

      The topic is not about "vestigial" DNA.

      TFA talks about bacteria being mixed in with human samples accidentally, then sequenced. The bacterial DNA shows up with the human DNA, and the bacterial DNA is being documented as human.
    • by Ruke ( 857276 )
      Exactly; the "fitter," undetected DNA has no opportunity to reproduce and pass on it's trats; we're simply culling members from a static population as they present themselves. You could argue that the population isn't exactly static: new genes are being sequenced and inserted into the database; therefore "fitter" DNA will squirm its way into the databases more frequently. However, we definitely won't be seeing any "evolutionary" arms race - the database entries have no affect on the biological populations o
    • Is it really static? You assume researchers aren't going to try to use this raw data to generate any actual end product. I wouldn't make that assumption.

      See Craig Venter's latest attempts at synthetic life, "Mycoplasma laboratorium".
  • by teebob21 ( 947095 ) on Thursday June 23, 2011 @02:40PM (#36545876) Journal
    At first I was relieved that this was a bacteria infecting silicon. Now I'm concerned: When will Avast release an Antibacterial beta? I'm still running Windows, folks! I know I'm vulnerable to this!!!
    • by lennier ( 44736 )

      When will Avast release an Antibacterial beta?

      Well, since a computer virus just injects code into an already-existing hardware processor, I guess a computer bacteria would have to carry around its own little itsy-bitsy mini-PC on little ambulatory robot legs, eat power from sockets where they can find it, and reproduce by splitting down the middle into two extra widdle bran-new baby mini-PCs.

      Truly an insidious force. They'd infect the entire world through their sheer power of cuteness.

  • by kilraid ( 645166 )
    No resources are freed for any future generations of database contaminants to breed on by filtering. And, the notion of evolution would also require changes to the contaminants, which don't really happen. So by all means, filter. It will leave harder-to-detect contaminants there, but they won't become more numerous.
  • by Anonymous Coward

    Is the human genome contaminated or is it the published sequence that is contaminated?

    If only the latter, fix the title.

  • by JonySuede ( 1908576 ) on Thursday June 23, 2011 @02:49PM (#36546046) Journal

    that part is nonsensical:

    The mycoplasma genes are clearly successful in reproducing themselves in silico raising the possibility that we're seeing the beginnings of an entirely new kind of landscape of infection. One option to combat this kind of virtual infection is to protect databases with the genomic version of antivirus software, a kind of virtual immune system. But this in itself could make things worse by triggering an evolutionary arms race that selects genes most capable of beating the safeguards.

    static data don't evolve

    • by pz ( 113803 )

      that part is nonsensical:

      The mycoplasma genes are clearly successful in reproducing themselves in silico raising the possibility that we're seeing the beginnings of an entirely new kind of landscape of infection. One option to combat this kind of virtual infection is to protect databases with the genomic version of antivirus software, a kind of virtual immune system. But this in itself could make things worse by triggering an evolutionary arms race that selects genes most capable of beating the safeguards.

      static data don't evolve

      The original poster was engaging in self-indulgent free association.

      • Is that the newest euphemism for being stupid? This *is* a Slashdot editor we're talking about.

    • by lennier ( 44736 )

      static data don't evolve

      Nothing in the real world is truly static over time. You think your /etc config files are static data? Ever done a series of in-place system upgrades?

      • static data don't evolve

        Nothing in the real world is truly static over time. You think your /etc config files are static data? Ever done a series of in-place system upgrades?

        I installed my router, and then applied the system immutable flag to all of my /etc directory. So, my /etc data has been static for 10 years! ...

        and has been hacked 42 times...

      • well the pictures of Lenna [wikipedia.org] taken years ago sure did not change. Some things are to be considered static at the human scale.

  • by quax ( 19371 ) on Thursday June 23, 2011 @02:54PM (#36546102)

    My understanding is that nowadays new high speed sequencing machine can get an entire human genome processed in a couple of month.

    So I would think that after a couple of independent runs one should be able to flush out the non-human DNA assuming the same bacteria contamination is not ever present?

    Obviously this is not a cheap endeavor but given that there is quite a bit of commercial interest in using correct human genome data this seems to me to be a worthwhile investment.

    I find it puzzling that the abstract of the article does not allude to this.

    • From the abstract

      "We ... suggest there is a need to clean up genomic databases but fear current tools will be inadequate to catch genes which have jumped the silicon barrier. "

      http://arxiv.org/abs/1106.4192 [arxiv.org]

      • by quax ( 19371 )

        I read this as cleaning up an already corrupted database. Hence my question why you don't go back to the source? Preferably repeatedly and independently to have a better statistic in separating "noise" i.e. Mycoplasma from "signal" i.e. human genome.

        • Well, my first response is, feel free to try it.

          But remember the source material is one individual's genetic material. I believe in the original study they repeated the chemistry many times to be sure the findings were consistent. Assuming you can get this individual to give you some DNA why do you think it won't be contaminated as well. Remember that there are a large number of genes that have not been associated with some function. Personally I think it is more important to figure out what the protein

          • by quax ( 19371 )

            I am in no position to judge the biological relevance of this 1% error.

            But I am also puzzled why the focus is on one individual's DNA. Wouldn't it make more sense to work with samples of several individuals in order to throw out the - presumably minute - individual variances? I would expect the latter not to be very helpful for medical research.

            • The variances are what makes us different, one from another. Medical research is very interested in why some people get diseases and others do not. The 1000 Genomes Project was announced in 2008 and finished its pilot study last year http://en.wikipedia.org/wiki/1000_Genomes_Project [wikipedia.org]
              • by quax ( 19371 )

                Fair enough, but wouldn't this be an even better argument for sampling from various individuals? Other wise how are you going to determine where the deltas are?

  • by Vornzog ( 409419 ) on Thursday June 23, 2011 @02:55PM (#36546114)

    How in the world will setting filters on a database put a bacteria in a lab half way around the world at an evolutionary disadvantage? The bacteria will still grow, contaminate the sample, and get sequenced, but the sequence will be rejected. There is no feedback mechanism here, no selective pressure.

    Genome sequence assembly is pretty far removed from the milieu in which a bacteria must make it's way. And inadvertently including bacterial sequences on a gene expression chip is sloppy science, but hardly news.

    Traditional computer viruses are the only things that truly 'reproduce' in silico. Memes are your next best option, but the 'net is just a carrier - they have to infect a human host to reproduce. Stay away from 4chan if you want to avoid infection...

    But bacteria? In silico? Where are we going with this strained analogy, anyway?

    • The "will be rejected" part, I think, is where the issue comes in. Rejected based on what? Comparison to a known good database is demonstrably suspect and is in fact the main point of TFA. If I can find 90% of the non-human DNA corruption in the database and delete it, that now cleaned database becomes the standard. The other 10% of non-human DNA that wasn't caught in the database is now even more vetted, more certified, and less easily detected and deleted by the same database scanning algorithm. Thus
      • The "will be rejected" part, I think, is where the issue comes in. Rejected based on what?

        Ummm, maybe on known viral/bacterial/mycoplasmal sequences? It's pretty much routine when you're assembling a genome, and it's not hard to screen a database retrospectively as new contaminating genomes are discovered.

        As for sequence data mutating and evolving in silico in genome databases (if that's what people are saying here; I can't be sure), well... That might be a good plot for a SciFi novel, but not one that would seem credible to any biologist.

  • by Anonymous Coward

    For genome 3.1.

  • So to what extent does this "epidemic of mycoplasma contamination" increase the potential for false-positives on DNA matching tests, such as used in criminal investigation or paternity cases? Does a given lab or lab-equipment manufacturer have a common strain of contamination which increases the number of "always match" markers above the threshhold defined for claiming a match?

    • This has little to no relevance for DNA matching tests. Those tests do not match specific sequences, they usually match lengths of repeats in repetitive elements - elements that are unlikely to have been drawn from mycoplasma (because they don't have them!) http://en.wikipedia.org/wiki/DNA_profiling [wikipedia.org]
      • Ok, thanks. Here in the San Francisco Bay Area lately there has been scandal after scandal concerning sloppy forensics lab operation, theft of evidence, and police departments conspiring to hide histories of police officer misconduct form defense teams. This would have been just one more nail in the coffin.

  • This is EXACTLY what happened in 70s-80s with Henrietta Lacks IMMORTAL 'HeLa' cell. http://en.wikipedia.org/wiki/HeLa [wikipedia.org]
    Her cells were the first Human cells to grow outside the human body.
    In fact they were so successful, that unbeknown to scientists ALL OVER THE WORLD, her cells had TAKEN OVER all of the cells in their labratories GLOBALLY.
    There is an amazing BBC documentary on this by Adam Curtis called "Modern Times: The Way of All Flesh"
    wiki quote " Contamination: Because of their adaptation to grow
  • In the case of little Jeffery, Mycoplasma, you ARE the father!

    • by lennier ( 44736 )

      In the case of little Jeffery, Mycoplasma, you ARE the father!

      Join me, and together we can rule the upper right nasal cavity!

      Noooo! I'll never exchange plasmids with you! E Coli, why didn't you tell me?

  • So it would seem Evolution is favoring hackers, that breed well...

    - Dan.

  • by Anonymous Coward on Thursday June 23, 2011 @03:35PM (#36546596)

    As a career microbiologist and bioinformatics geek, the complete and utter scientific inaccuracy of this summary made me want to cry.

    The mycoplasma genes are clearly successful in reproducing themselves in silico raising the possibility that we're seeing the beginnings of an entirely new kind of landscape of infection. One option to combat this kind of virtual infection is to protect databases with the genomic version of antivirus software, a kind of virtual immune system. But this in itself could make things worse by triggering an evolutionary arms race that selects genes most capable of beating the safeguards

    Mycoplasma is a common contaminant of many human cell culture lines. It is often present in low counts, and is a relatively slow growing organism. This is a problem, because many of the immortal cell lines are passed serially, meaning that the mycoplasma propagates right along with it. Most labs that perform cell culture now do routine PCR testing for mycoplasma markers as a quality control measure.

    When it comes to sequencing, and in particular, high-throughput next generation sequencing (Illumina/454/SOLiD/PacBio/whatever), you are shotgun sequencing all of the DNA in a given sample extract. This means that if you had a bunch of human cells, that happenned to be contaminated with low counts of mycoplasma, those mycoplasma sequences would be present to some extent in your final sequencing project. Whether this would factor into the final assembly, or just get thrown out depends on the quality control, experience of the bioinformatics team and assembly software pipeline. I am willing to be that most issues with mycoplasma contamination were during the "formative" years of high-throughput sequencing, but may have lingered in databases. These databases would in turn might used by commercial companies that build microarrays or other high-density tools, so it's feasible that some mycoplasma sequence carried over.

    Is this relevant? Probably not. On a microarray, it would most likely be wasted space (eg: always negative during gene expression studies... unless the patient had a mycoplasma infection or something). Furthermore, a simple analysis of the sequence would help to rule out sequences that were clearly prokaryotic.

    "In silico" does not mean what you think it means. In fact, this whole bit about in-silico replication and arms races is complete and utter nonsense. In-silico biology usually refers to biocomputing. Eg: analyzing, manipulating and simulating gene/protein sequences, expression, signalling cascades, and the like on a computer system. It does not apply to mycoplasma sequences running around all nambly pambly causing infections that would require some sort of anti-virus software. What they might be alluding to is the fact that a lot of shotgun sequencing libraries are run, as needed, through a vector screen, which is designed to pull out irrelevant sequences that may have been necessarily introduced during cloning or sequencing. Plasmids, cosmids, whatever. These algorithms may need better tuning to do a better job of ruling out mycoplasma in human sequences, but there's no danger of these mycoplasma sequencing replicating and taking over the world.

    Unless you happen to be William Gibson.

    • True. In the end, mycoplasm is just another contributing factor to signal/noise in your dataset. It's completely illusory to assume that you get noiseless measurements given the amount of data involved.
  • horrible language (Score:5, Informative)

    by Taibhsear ( 1286214 ) on Thursday June 23, 2011 @03:38PM (#36546642)

    This article was horribly written. They go between using terms with their literal meaning and using terms in metaphorical creative language but do not differentiate between the two using context at all. It's an incredibly confusing read. Actual ancestral human DNA is not contaminated with actual mycoplasma DNA sequences.

    Here's what I gather is going on:
    Researchers took a sample of human DNA and sequenced it, while doing so the sample was contaminated with DNA from mycoplasma (possibly from bacteria in the lab or on the researchers themselves). While sequencing it, the data is assumed to be a representation of pure human DNA (which would be incorrect). Other researchers then use this data set as a reference to compare other human DNA samples they sequenced themselves. They use this to test gene expression and so forth. So if their DNA samples show gene expression for mycoplasma they would incorrectly think it was normal human gene expression. What they did is use software to strip the mycoplasma DNA data from the original data set (that had both human and mycoplasma DNA sequences) to only use the actual human DNA data as a reference. The biological contamination was first in the original sample that was tested, and then the contamination referred to elsewhere is computational data "contamination." This is the software they are referring to as antivirus software and virtual immune system (which isn't antivirus software or similar to a biological immune system, it's DNA data filtering software).

    These people really need to think about what they're trying to say before puking up jargon salad on the readers' brains.

    • It seems as if the metaphors (e.g., "virus") that computational science has borrowed from biology have come around full-circle, with the result that concepts from different fields are getting conflated with one another in bizarre ways. The reasoning seems to be: If data (in the form of a computer program) can replicate and spread to other machines, then perhaps DNA sequence data in genomic databases can perform similar biological feats like mutation, evolution, and transmission. This seems inane enough tha

  • Mindless drivel (Score:4, Interesting)

    by Iron (III) Chloride ( 922186 ) on Thursday June 23, 2011 @03:54PM (#36546862)

    I don't want to be excessively harsh but the summary was seriously a bunch of drivel. In silico either means it's data on the computer, or that you are simulating a biological process computationally. But as other posters have mentioned, unless you are purposely simulating evolution, mycoplasma sequences in your human databases isn't going to cause any "arms race." Yes, it seriously screws with validity, but that's a completely different issue.

    This is a generalization, and no offense to fellow Slashdotters, but in my experience most of the computer scientists that I've met have a really crappy understanding of even basic biology. CS concepts don't directly translate to biology ones.

    • I don't want to be excessively harsh but the summary was seriously a bunch of drivel. In silico either means it's data on the computer, or that you are simulating a biological process computationally. But as other posters have mentioned, unless you are purposely simulating evolution, mycoplasma sequences in your human databases isn't going to cause any "arms race." Yes, it seriously screws with validity, but that's a completely different issue.

      You're still missing the point.

      Methods to screen out junk contam

      • First off, the fact that we are continuing to resequence individual human genomes through projects like the 1000 Genomes Project (and attempting to do de novo assemblies, so we're not just relying on the HGP reference genome) as well as articles telling out about such incidences makes it in my view unlikely that significant contamination will continue as research continues.

        Putting that aside, I fail to see how how the usage of invalid DNA sequences in biomedical research, leading to problems with disease tr

  • Myco means fungi right?
    • by treeves ( 963993 )

      Yes. But this is about bacteria that happens to have the genus name Mycoplasma, not fungi.

  • i believe the evolutionary arms race was triggered a long time ago (possibly in a galaxy far far away) /. is so full of tripe today its making me question my patronage
  • Were these the patented sequences?

  • When you assemble a genome, you assemble the sequences into chromosomes based on overlap with other sequences. This contamination should not match up properly, or be assemble into its own "chromosome".

    The whole "evolution" thing is the biggest sensationalist bullshit I've ever heard. Ignore it.

    As was mentioned in another comment, it seems like the summary is misleading on the "contamination" actually being in the genome sequence.

  • The use of terms for sequence data and expression data are not interchangeable. The U133 microarray is for RNA, yes RNA, expression data. RNA microarrays quantify the fold change difference in expression between different subjects. DNA microarrays identify polymorphisms or repeats or the like. While arrays like the U133 rely on sequence level data to create the array, this is not the same as saying that sequence-level data is contaminated. Bottom line, the fact that this is not the cover article for N
  • So the arXiv article (http://arxiv.org/abs/1106.4192v1) cited in TFA is a little more accurate than the already-much-panned summary. But the authors of the arXiv article still use the term 'virtual infection' which is very misleading at best.

    Basically there are a few (actually only two described in the paper) entries in one particular human genome database maintained at EMBL that appear to be mycoplasma-derived. Two out of 45,000 features on the Affymetrix Human U133 +2 oligonucleotide array, used to qu
  • Seems there are other possibly related forms of this beast: http://www.nap.edu/openbook.php?record_id=11765&page=181 [nap.edu]
  • These guys have a lot of crap to talk. Maybe they are very idle.http://www.laserqueen.com.au/injectables.html

The 11 is for people with the pride of a 10 and the pocketbook of an 8. -- R.B. Greenberg [referring to PDPs?]

Working...