Colossus: The Beast That Powers Google

When Google launched Google File System (GFS), it was considered a revolution for the web. GFS helped Google use the hoard of machines at its data center as a single unit by well synchronizing their functions. As Google added huge amounts of data to its servers, it was evenly distributed across all machines through GFS. And GFS would create a new search index on regular basis.

So popular was GFS that soon other web giants such as Yahoo, Facebook and others built their own version of it. Google released research papers detailing how GFS works and soon, which led to an open source platform called Hadoop built along the same lines.

Move to Colossus
However, Google has been evolving and recently, the company devised a new way of significantly improving its foundation. It is basically a revamped file system and is called Colossus. Currently, more or less all of Google’s products are based on Colossus. From Gmail to Google Docs and YouTube, all these services run on top of Colossus.

So what made Google move from GFS to Colossus and why is Colossus significant? GFS was more suited for batch operations in which first the changes occurred to the whole system in the background and eventually, those changes were applied to the actual system. With Colossus, this has changed. Colossus is more suited to real-time operations. Colossus makes use of a new search infrastructure called ‘Caffeine’ which enables Google to update its search index in real-time, rather than first do it in the background and then apply to live system.

Another very important feature of Colossus is that whereas in GFS, there was only one master node, in Colossus there are many. So, for instance, if the node went down in GFS, the whole system would go down temporarily. This is not the case in Colossus where multiple master nodes operate at the same time.

Naturally, others on the web know of this transition by Google and they also know that Colossus is far more useful than GFS. The result is that a number of changes have already been made to the open-source Hadoop to make it look more like Colossus. Hadoop developers are actively working to bring concepts of multiple nodes to Hadoop. And, the framework’s adoption is also growing which now includes two more tech giants – Twitter and eBay.

So in a way, Google’s Colossus is driving innovation all across the web.

Courtesy: Wired


Salman Latif is a software engineer with a specific interest in social media, big data and real-world solutions using the two.Other than that, he is a bit of a gypsy. He also writes in his own blog. You can find him on Google+ and Twitter .

Leave a Reply