Researchers at IBM say they have developed a new algorithm that can analyze terabytes’ worth of raw data in a matter of minutes. The algorithm could be used to more quickly predict weather, electricity usage and water pollution levels, for instance. It could analyze data gathered from sources such as sensors or smart meters, as well as break down data from global financial markets and assess individual and collective exposure to risk.
The mathematical algorithm, developed by IBM’s laboratories in Zurich, can sort, correlate and analyze millions of random data sets, a task that could otherwise take days for supercomputers to process, said Costas Bekas, a researcher at IBM.
The traditional approach to data analysis is to take multiple data sets and look at them individually, said Eleni Pratsini, manager of mathematical and computational sciences at the IBM research labs. However, the algorithm compares data sets against each other, which could help enterprises point toward larger trends in particular areas, such as risk reduction in financial portfolios.
The new algorithm combines models of data calibration and statistical analysis that can assess measurement models and hidden relationships between data sets.
The algorithm can also reduce the cost burden on companies by analyzing data in a more energy-efficient way, Bekas said. The lab used a Blue Gene/P Solution system at the Forschungszentrum Julich research center in Germany to validate 9TBs of data in less than 20 minutes. Analyzing the same amount of data without the algorithm would have taken a day with the supercomputer operating at peak speeds, which would have added up to higher electricity bills, Bekas said.
Now that the algorithm has been shown to work, it may soon be used in IBM software applications.