Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
China Supercomputing Science Hardware

Chinese Lab Speeds Through Genome Processing With GPUs 408

Eric Smalley writes "The world's largest genome sequencing center once needed four days to analyze data describing a human genome. Now it needs just six hours. The trick is servers built with graphics chips — the sort of processors that were originally designed to draw images on your personal computer. They're called graphics processing units, or GPUs — a term coined by chip giant Nvidia. This fall, BGI — a mega lab headquartered in Shenzhen, China — switched to servers that use GPUs built by Nvidia, and this slashed its genome analysis time by more than an order of magnitude."
This discussion has been archived. No new comments can be posted.

Chinese Lab Speeds Through Genome Processing With GPUs

Comments Filter:
  • by mastershake82 ( 948396 ) on Sunday January 08, 2012 @03:57PM (#38631752)
    Sounds like these newfangled "GPUs" are gonna change the world.
    • by MollyB ( 162595 ) on Sunday January 08, 2012 @04:39PM (#38632054) Journal

      If one reads to page 2 of tfa, they only claim the technique works well in this instance. They go on:

      Even for computer-intensive aspects of analysis pipelines, GPUs aren’t necessarily the answer. “Not everything will accelerate well on a GPU, but enough will that this is a technology that cannot be ignored,” says Gollery. “The system of the future will not be some one-size-fits-all type of box, but rather a heterogeneous mix of CPUs, GPUs and FPGAs depending on the applications and the needs of the researcher.”

      and

      GPUs have cranked up the speed of genome sequencing analysis, but in the complicated and fast-moving field of genomics that doesn’t necessarily count as a breakthrough. “The game changing stuff,” says Trunnell, “is still on the horizon for this field.”

      So yes, the article is a bit breathless, but if utilizing GPUs helps cure my potentially impending genetic disorder, I'm all for it.

    • by Anonymous Coward

      I used to work with Texas Instruments TMS34010/32020/34082 processors in the 1990's. These were surface mounted onto a VGA graphics board, along with a number of TMS34082 vector processors and a few megabytes of memory (Hercules Graphics Station Card as an example). They had this really neat feature where you could cross-compile, download and execute programs on these boards as "extensions". You could do anything from encryption/decryption, image-processing to drawing lines and rendering triangles.

      Initially

  • by Anonymous Coward on Sunday January 08, 2012 @03:57PM (#38631756)

    I always wondered what GPUs are. Thanks Slashdot!

  • by Anonymous Coward on Sunday January 08, 2012 @04:06PM (#38631808)

    Explaining what a GPU is in a slashdot summary? Come on.

    This is similar to someone telling you a story about something funny happening to them while shopping at the store, pausing mid-story to inform you that a 'store' is a business where goods are displayed and exchanged for a papery substance called 'money'.

    • It might have some use. "store" is chiefly American English. British would prefer "shop", though they should definitely be able to understand what you are talking about. But in both BE and AE "store" would also mean "to keep".

    • by Anonymous Coward

      I don't mind the explanations in the submitter's summary too much: it's better than some of the jargon/acronym laden summaries that totally obfuscated some summaries, and abstracts need to avoid jargon in order to pull in interested readers. I do, however, mind that the summary just plagiarizes the first few sentences of the Wired article. I'm also unhappy with the watered-down article; summaries and abstracts need to avoid jargon for clarity, but articles need to use the right words to convey their points

  • Submitter couldn't find a more technically-oriented one?
  • by Anonymous Coward
    It reads like some Reader's Digest piece. I can't believe timothy published it like that. :)
  • A reminder (Score:5, Insightful)

    by Mannfred ( 2543170 ) <mannfred@gmail.com> on Sunday January 08, 2012 @04:18PM (#38631904)
    It's hardly news that GPUs can be used to speed up parallel tasks/computations, but even so this article is a useful reminder of two things; 1) there are still many important processes that can be sped up by using GPUs, and 2) this can be achieved pretty much anywhere in the world.
    • The only reminder should bethat processors designs for different types of math can do that math faster than processors designed for other types of math.

      I don't understand why companies don't realize that. Running graphics on a floating point processors is like using a train to go across an ocean. Sure you can do it doesn't mean that it is a good idea.

    • I wonder if the AMD use of more cores, whereas Nvidia uses faster cores, would change the time. I have no idea how genetic algorithms work. I do know simple hashes like bitcoins are best on AMD.
    • I always wondered why FPGA's aren't used for this kind of stuff, or if they already are. I would imagine they would even be faster because you can design a circuit specifically optimized for the problem. But now that I think about it, NVidia and AMD put considerable amounts of resources into making them super fast and cheap. I guess price/performance ratio would be pretty damn good on a GPU vs FPGA.
      • I always wondered why FPGA's aren't used for this kind of stuff, or if they already are. I would imagine they would even be faster because you can design a circuit specifically optimized for the problem.

        I think they are to some degree, but there is a major barrier to adopting them: they require specialized programming knowledge which you won't find in most genomics centers. GPUs are commodity technology and APIs like CUDA are easier to tackle (and more transferable to other fields) than FPGA programming.

        • Are any researchers really looking for skilled FPGA designers? It's a pretty dedicated skillset, but work on research would be more interesting than my current job. Also, it should be noted that the devices themselves (FPGAs), and the tools needed for the design flow (particularly synthesis tools) are expensive, and computationally intensive in and of themselves. Unfortunately we don't have the open source tools of the software world available to us.
      • The versatility of FPGAs comes at a steep price in die area, power consumption, and operating frequency. If your design goal is "We want to do this specific kind of math Real Fast.", and somebody already makes an ASIC that does that kind of math Real Fast, the ASIC is generally a lot more cost effective than using FPGAs.
    • According to Jackson Lab’s TeHennepe, the feat BGI and NVIDIA pulled off was porting key genome analysis tools to NVIDIA’s GPU architecture, a nontrivial accomplishment that the open source community and others have been working toward.

      Can anyone familiar with current efforts shed more light on this? Who is working on open source bioinformatics and how much work has been done?

    • What the article doesn't say:

      • BGI is late to the party when it comes to using GPUs to process genomic data.
      • "Processing" could mean just about anything.
  • A better article (Score:5, Informative)

    by arielCo ( 995647 ) on Sunday January 08, 2012 @04:31PM (#38631998)
    http://hpcwire.com/hpcwire/2011-12-15/bgi_speeds_genome_analysis_with_gpus.html [hpcwire.com]

    Excerpt:

    At BGI, he says, they are currently able to sequence 6 trillion base pairs per day and have a stored database totaling 20 PB.

    The data deluge problem stems from an imbalance between the DNA sequencing technology and computer technology. According to Dr. Wang, using second-generation sequencing machines, genomes can now be mapped 50,000 times faster than just a decade ago. The technology on track to increase approximately 10-fold every 18 months. That is 5 times the rate of Moore's Law, and therein lies the problem.

    Obviously it would be impractical to upgrade one's computational infrastructure at that rate, so BGI has turned to NVIDIA GPUs to accelerate the analytics end of the workflow. The architecture of the GPU is particularly suitable for DNA data crunching, thanks to its many simple cores and its high memory bandwidth.

    • Re:A better article (Score:5, Informative)

      by Samantha Wright ( 1324923 ) on Sunday January 08, 2012 @04:59PM (#38632190) Homepage Journal
      ...countering this stunning and exciting revelation is BGI's stunning and exciting reputation for producing stunningly and excitingly low-quality raw data from said stunning and exciting second-generation sequencing machines. This is a little like the biology equivalent of being told that your least-favourite Slashdot editor (please pick just one) has just gotten a brain implant so he can spam the front page with dupes, typo-ridden summaries, and fallacy-laden opinion pieces ten times an hour.
      • by MaizeMan ( 1076255 ) on Sunday January 08, 2012 @05:42PM (#38632486) Homepage
        Although at least in my field the problem is that no one ever thought to set lower limits on the quality of what you can call a genome. So now we get "genomes" made up of 100,000 contigs (many only a couple of hundred base pairs long) and even counting all of those, the total sequence might account for only 70% of the total size of the genome. But it's still a "genome" paper, which is still an instant ticket to Nature Genetics (or Nature Biotechnology if the assembly is REALLY bad).

        BGI is certainly one of the biggest offenders (Cucumber and Pigeonpea are both examples of the sort of terrible genomes-in-name-only BGI puts out) but I think the real problem is that Illumina sequence data is so cheap people keep trying to use it to sequence genomes, thinking if they throw enough raw data and enough mate-pair libraries at the problem it'll eventually make up for the fact that Illumina reads are so short. Illumina data is great for a lot of things. Calling SNPs, measuring gene expression, studying methylation patterns.

        But, at least for any genome significant transposon content, it simply does not work.
        • Incidentally, ABI claims you can do de novo with SOLiD systems (which have read lengths of only ~20 bp!) but they say you need to get about 300x coverage just for a bacterial genome. That's not a lot of saved money when you work out all the numbers. It looks like we've nearly found a state function for dollars-per-high-quality-nucleotide.
          • you need to get about 300x coverage just for a bacterial genome

            OUCH. Wasn't the original high-quality human genome sequence (using Sanger technology) only about 10x? And doesn't having only 20bp per read basically rule out de novo sequencing of any eukaryote? Even for bacteria that sounds tricky without a closely-related reference sequence.

            • Yes, yes, and yes. To be fair, these are comparatively cheap and fast runs, but the numbers are still ridiculous, I agree. Hopefully third-generation sequencing technologies (not counting Pacific Bio's implausible promises of "3.1 billion flying pigs in 30 seconds flat!") will do better at pandering to us poor underfunded evolutionary biologists.
    • The problem with next generation sequencing is that it produces a lot of garbage as well. There is no free lunch. And that is why a lot is passed on computers to handle that garbage. Also, computation speed hours per genome annotation does not make sense without reference to what exactly and at what reliability is being annotated,

  • I get that programmers are offloading certain tasks to the GPU because they are able to perform specific tasks faster, but why is this even necessary. If the GPUs are so good at it then why can't there be a dedicated part of the CPU to perform these same computations in parallel streams the same way the GPU does?
    • by wbr1 ( 2538558 )
      Both modern Intel and AMD CPUs come in flavors that include a GPU core. I am currently running a laptop with an AMD E-450 that has a GPU core. Admittedly this core is stripped down, but it is there, and functional, and probably better than many higher end GPUs of 4-6 years ago. There are two other issues surrounding the use of GPUs for processing. One, competing APIs, and two, few programs make use of the availability. I believe some Adobe software (either Premiere or some Photoshop filters) are now wr
    • Genomic analysis involves extensive use of recursive techniques, which are well suited to parallel processing and combinatoric problems. GPU's are small independent components originally designed to handle large matrices of pixel elements for video programming very quickly for video display and refresh. Thus, they can when suitably programmed, for example using CUDA, in parallel to compute solutions required to map problems of high combinatoric dimensionality onto a one dimensional space (sequence) very q

      • the Chinese are picking up on the technology and on genomic data mining far faster and with more intensity than is the broader US tech community.

        You're forgetting that the vast majority of countries actually developing this technology, and making it available to consumers, are based in the US (and Britain, to some degree). One recent article about the BGI that I read last year noted the irony of seeing several crates of sequencing machines stamped "MADE IN THE USA" waiting to be unloaded in Shenzhen. The

        • Except that genomics has as of yet proven minimally useful for drug development. Until they actually develop significant amounts of homegrown technology (which, to be fair, they are actually doing in the bioinformatics arena, as opposed to sequencing), I'm not convinced that they're that much of a threat.

          What if they simply avoid competing by patenting the sequence for Caucasians and then pulling an Apple and suing us out of existence? ;^)

        • Sounds a lot like wishful thinking to me. I agree that virtually all technology is global these days. Its just the rate of uptake that is astounding. About 35% of all US PhD students are Chinese an other 35% are Indian, while US grads are diminishing as a percentage. Major cutbacks in the UK now as well, but China is growing in double digits in most technology areas. I don't read Chinese myself, but the number of journals in the genomics area for Chinese readers is growing fast.

          Beg, borrow, steal, coll

          • In any event, the throughput for NVIDIA's Tesla product lines are quite impressive. They really are revolutionizing computational biology, where there are many NP complete and NP Hard problems that can only be tackled with very past processors (in parallel) and with heuristic rather than exact algorithms. Do you know if these are manufactured here or in Asia?

            I don't know where they're manufactured; my impression was that most of the really powerful chip-fabrication technology was still essentially based in

    • First question: Why do you want it from Intel, versus anybody else? They've always struck me as moderately evil - the Microsoft of the chip world, looking out for their own sales numbers and not much else.

      Second question: Which do you want in your chip; a fast CPU that can run your web browser and E-mail client, or a fast parallel computing unit that's good for gene sequencing, multimedia processing, etc? You can't have both. Well, you can, but both parts will be slower. The top-of-the-line chips you get th

    • by Xrikcus ( 207545 )

      There is: AVX. The difference is that to cope with the workloads GPUs are NOT good at, a lot of the CPU transistors are dedicated to things other than AVX units and registers so the peak is lower.

  • SIMD chips will always show computational gains to any class of problem that makes significant use of matrix multiplication or linear algebra. So graphics, crypto, etc..
  • And other assorted distro-computing tasks. Hell, my old x1800's stopped being supported for the current Folding software years ago.

    A nice list of distro computing projects [wikipedia.org].

    Another nice list of such projects [distribute...uting.info].
    • I have been volunteering for more than a decade now. I first started with united devices. They have stopped now for about 5 years. I started with single core computers. About 4 years ago I bought my first 4 core computer and last year I bought two 6 core computers. The 6 core computers do twice as many results than the 4 core computers. The 4 core computers do 6 times as many results as the single core. Therefore I think a single 6 core computer would pay for itself in electricity costs in less than
  • "GPUs . . . ? . . . I was informed that this project was powered by GNUs . . . ?"

    ". . . now where is that Apple MAC chip that generates the GPL number that allows the PC to connect to the Internet . . . ?"

  • I hate to say it, but I tend to be a bit skeptical about any research news coming out of China, since so much of it has been falsified in the past few years. So until some Chinese researcher shows me a six-assed monkey, my response to this news is going to be "Meh."
    • That seems to be the general attitude across the board in the US, but it seems unlikely to be warranted any more. They keep growing their economy at between 8-10% per year. Estimates are their GDP will overtake that of the US in about 2025, if not sooner. The days of resting on laurels will have been gone by then. In any event if we are that far ahead, it seems hard to get a sense of that on slashdot judging from the sophistication of most comments.

      Besides, I be curious to know what specific research ha

      • Besides, I be curious to know what specific research has been falsified?

        Here's a decent summary of the problem:
        http://www.nature.com/news/2010/100112/full/463142a.html [nature.com]

        Its not as if the US fossil fuels industry hasn't been doing the same here with respect to climate science.

        "Microsoft has put out some faulty software, so I'm gonna buy my next operating system from VaporWare Inc"

        They keep growing their economy at between 8-10% per year

        That's part of the problem:

        A new study from Wuhan University, for instance, estimates that the market for dubious science-publishing activities, such as ghostwriting papers on nonexistent research, was of the order of 1 billion renminbi (US$150 million) in 2009 - five times the amount in 2007.

  • Just thing how cool it will be to have cards that can do this on the CPU BUS.

  • by Cow Jones ( 615566 ) on Sunday January 08, 2012 @06:06PM (#38632656)
    ... this is what a Chinese lab looks like [dogster.com].
  • Imagine if the calculus were processed in a FPGA, it would be another magnitude faster :P

You will lose an important tape file.

Working...