Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Math

Data Sorting World Record — 1 Terabyte, 1 Minute 129

An anonymous reader writes "Computer scientists from the University of California, San Diego have broken the 'terabyte barrier' — and a world record — when they sorted more than a trillion bytes of data in 60 seconds. During this 2010 'Sort Benchmark' competition, a sort of 'World Cup of data sorting,' the UCSD team also tied a world record for fastest data sorting rate, sifting through one trillion data records in 172 minutes — and did so using just a quarter of the computing resources of the other record holder."
This discussion has been archived. No new comments can be posted.

Data Sorting World Record — 1 Terabyte, 1 Minute

Comments Filter:
  • by MichaelSmith ( 789609 ) on Tuesday July 27, 2010 @11:36PM (#33053356) Homepage Journal

    I had a 6502 system with BASIC in ROM and a machine code monitor. The idea is to copy a page (256 bytes) from the BASIC ROM to the video card address space. This puts random characters into one quarter of the screen. Then bubble sort the 256 bytes. It took about one second.

    For extra difficulty do it again with the full 1K of video. Thats harder with the 6502 because you have to use vectors in RAM for the addresses. So reads and writes are a two step operation, as is incrementing the address. You have to test for carry. But the result was spectacular.

  • Only 52 nodes (Score:5, Interesting)

    by Gr8Apes ( 679165 ) on Tuesday July 27, 2010 @11:45PM (#33053406)

    You've got to be kidding me. Each node was only 2 quad core processors, with 16 500GB drives (big potential disk IO per node) but this system doesn't even begin to scratch the very bottom of the top 500 list.

    I just can't image that if even the bottom rung of the top 500 was even slightly interested in this record, that they wouldn't blow this team out of the water.

  • by Anonymous Coward on Wednesday July 28, 2010 @01:00AM (#33053690)

    I work in the OLAP realm. Trust me, it matters. Being able to run an adhoc query across terabytes of data with near real-time results is the holy grail of what we do. The industry has known for a while that parallel computing is the way to go, but only recently has the technology become cheap enough to consider deploying on a large scale. (Though Oracle will still happily take millions from you for Exadata if you want the expensive solution.)

  • One other area... (Score:2, Interesting)

    by SuperKendall ( 25149 ) on Wednesday July 28, 2010 @01:29AM (#33053752)

    Come to think of it, one area where it also matters currently is in mobile development. If you aren't considering memory or processor usage you can quickly lead yourself into some really bad performance, thinking hard about how to make use of what little you have really matters in that space too.

    So only desktop or smallish backend development can generally remain unconcerned these days with algorithmic performance...

    I had to work with large datasets in my previous life as a backend IT guy, but nothing at the levels you are talking about. Even then I thought carefully about how any give approach would affect performance.

  • by pdxp ( 1213906 ) on Wednesday July 28, 2010 @01:58AM (#33053776)
    ... to make it clear that you won't be doing it (TritonSort, thanks for leaving that out kdawson) on your desktop at home:
    • 10Gbps Networking
    • 52 servers x 8 cores each = 416 CPUs
    • 24 GB RAM per server = 1,248 GB
    • ext4 filesystem on each, presumably with hardware raid

    I think this is cool, but.... how fast is it in a more practical situation?

    source [sortbenchmark.org]

  • by melted ( 227442 ) on Wednesday July 28, 2010 @04:14AM (#33054128) Homepage

    Let's consider 100TB in 172 minute thing they also did. 52 nodes, 16 spindles per node is 832 spindles total and 120GB of data per spindle. 120GB of data can be read in 20 minutes and transfered in another 15 to the target spindles (assuming uniform distribution of keys). You can then break it down into 2GB chunks locally (again by key) as you reduce. Then you spend another hour and a half reading individual chunks, sorting them in memory, concatenating and writing.

    Of course this only works well if the keys are uniformly distributed (which they often are) and if data is already on the spindles (which it often isn't).

An Ada exception is when a routine gets in trouble and says 'Beam me up, Scotty'.

Working...