Scientists Can Now Assemble Entire Genomes On Their Personal Computers In Minutes (phys.org) 44
Researchers have developed a technique for reconstructing whole genomes, including the human genome, on a personal computer. "This technique is about a hundred times faster than current state-of-the-art approaches and uses one-fifth the resources," reports Phys.Org. From the report: The study, published September 14 in the journal Cell Systems, allows for a more compact representation of genome data inspired by the way in which words, rather than letters, offer condensed building blocks for language models. [...] To approach genome assembly more efficiently than current techniques, which involve making pairwise comparisons between all possible pairs of reads, [researchers] turned to language models. Building from the concept of a de Bruijn graph, a simple, efficient data structure used for genome assembly, the researchers developed a minimizer-space de Bruin graph (mdBG), which uses short sequences of nucleotides called minimizers instead of single nucleotides. "Our minimizer-space de Bruijn graphs store only a small fraction of the total nucleotides, while preserving the overall genome structure, enabling them to be orders of magnitude more efficient than classical de Bruijn graphs," says [one of the researchers].
The researchers applied their method to assemble real HiFi data (which has almost perfect single-molecule read accuracy) for Drosophila melanogaster fruit flies, as well as human genome data provided by Pacific Biosciences (PacBio). When they evaluated the resulting genomes, [researchers] found that their mdBG-based software required about 33 times less time and 8 times less random-access memory (RAM) computing hardware than other genome assemblers. Their software performed genome assembly for the HiFi human data 81 times faster with 18 times less memory usage than the Peregrine assembler and 338 times faster with 19 times less memory usage than the hifiasm assembler. Next, [researchers] used their method to construct an index for a collection of 661,406 bacterial genomes, the largest collection of its kind to date. They found that the novel technique could search the entire collection for antimicrobial resistance genes in 13 minutes -- a process that took 7 hours using standard sequence alignment.
The researchers applied their method to assemble real HiFi data (which has almost perfect single-molecule read accuracy) for Drosophila melanogaster fruit flies, as well as human genome data provided by Pacific Biosciences (PacBio). When they evaluated the resulting genomes, [researchers] found that their mdBG-based software required about 33 times less time and 8 times less random-access memory (RAM) computing hardware than other genome assemblers. Their software performed genome assembly for the HiFi human data 81 times faster with 18 times less memory usage than the Peregrine assembler and 338 times faster with 19 times less memory usage than the hifiasm assembler. Next, [researchers] used their method to construct an index for a collection of 661,406 bacterial genomes, the largest collection of its kind to date. They found that the novel technique could search the entire collection for antimicrobial resistance genes in 13 minutes -- a process that took 7 hours using standard sequence alignment.
Re: (Score:2)
First post!
Ha ha, the joke's on you. Covid is an RNA virus, not DNA.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
This is about software reconstructing whole genes from sequence fragments, not reading the base sequences of those fragments. So I don't think it matters if it starts with DNA or RNA data.
Re: (Score:3, Insightful)
Am I the only one who hates it when they say "[x] times less"?
If X is three times more than Y, then Y is three times less than X.
It may be a clumsy way to say it, but it is a common English expression.
I suggest that you reserve your hate for those who confuse "your" and "you're", use the phrase "begs the question", and use "literally" as an intensifier. Those are far worse offenses.
Re: (Score:1, Interesting)
You didn't even get that part right either. You were probably trying to say 3 times as much as y.
You literally don't know what you're talking about.
Re: (Score:2)
"You literally don't know what you're talking about."
It's hard, if you gain 100%, when you lose it, it's only 50% how should people wrap their head around that, when 50% of them have an IQ of under 100. :-)
Re: (Score:2)
Let's add "could care less", its/it's, there/their, etc.
its liked their not teachin anglish in skoolz animore.
Re: (Score:2)
Re: (Score:2)
I suggest that you reserve your hate for those who confuse "your" and "you're", use the phrase "begs the question", and use "literally" as an intensifier. Those are far worse offenses.
Agreed, with the caveat that there's no need to hate those who use the phrase "begs the question" correctly.
Re: (Score:2)
Re: (Score:2)
"N times less X" invokes multiplication and subtraction. It does not invoke division. "N times less" literally means "-(N-1) times as much", which seldom makes physical sense because time and RAM and other things measured like that usually cannot be negative.
"One Nth the X" is the correct way to say 1/N resource consumption.
Re: (Score:2)
Re: (Score:2)
I bet you also think "N times faster" means N times as fast, rather than what it actually means.
Re: (Score:2)
Re: (Score:2)
"you are the only one who hasn't learn in school what was the operation inverse to multiplication."
Verbs are hard too. :-)
Re: (Score:2)
Re: (Score:2)
What's really amazing (Score:4, Interesting)
What's really amazing is how boolean algebra, the language of the gates in the CPU, can support all of these abstractions and algorithms.
Trees, graphs, linked lists, stacks, queues, insertion sorts, relational algebra, matrix operations, to whatever this is, all expressed as boolean algebra. Amazing.
Abstractions on top of abstractions. Yet somehow it retains, manipulates and produces information. Freaky.
Re: (Score:2)
Well, it would be freakier if it couldn't.
Re: (Score:1)
Yeah, next thing you know we'll be injecting lightning into rocks and teaching them to think.
Re: (Score:3)
Re: (Score:2)
There is no workstation in the world that can meat specifications because that makes no sense. That's also no vegetarian/vegan.
Re: (Score:1)
Re: (Score:2)
Now we know (Score:3)
what the Great Filter is...
So... (Score:2)
Does it mean we're closer to real-life cute catgirls now?
The future is biology. (Score:2)
If you have any children who are feeling pressured to learn computer programming, this story is a reminder that programming is becoming archaic. Like the Industrial Revolution, the Golden Age of Digital Computing, software and the internet is over. They are in maintenance mode. There is some activity in development of new energy sources and managing climate, but that doesn't promise great opportunities.
The future is biology, which is right now beginning an explosion of new discoveries and making a real and
Silly me (Score:2)
Here I thought that someone must have ditched their crappy javascript implementation for a proper binary built for the native system.