The new record was accomplished during the 2010 “Sort Benchmark”, when the team of scientists from the UC San Diego broke the terabyte barrier: sorting more than one terabyte of data in just 60 seconds.
...
The system is composed of 52 computer nodes; each node is a commodity server with two quad-core processors, 24 gigabytes memory and sixteen 500 GB disks – all inter-connected by a Cisco Nexus 5020 switch.
...
The team actually won two prizes; the second is for tying the world record for the “Indy Gray Sort” which measures sort rate per minute per 100 terabytes of data. “We’ve set our research agenda around how to make this better…and also on how to make it more general,” said Alex Rasmussen, a PhD student and a team member.
...
“Generally, sorting is a great way to measure how fast you can read a lot of data off a set of disks, do some basic processing on it, shuffle it around a network and write it to another set of disks. Sorting puts a lot of stress on the entire input/output subsystem, from the hard drives and the networking hardware to the operating system and application software.”
While current sorting methods apply for most data structures, the major difference when dealing with data larger than 1,000 terabytes is it’s well beyond the memory capacity of the computers doing the sorting. The team’s approach was to design a balanced system, in which computing resources like memory, storage and network bandwidth are fully utilized – and as few resources as possible are wasted.
http://thefutureofthings.com/news/10635/computer-scientists-break-terab…