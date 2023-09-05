Case Studies is a series explaining real-world software engineering simply. The goal is to teach without the unreadable density of most technical papers and blog posts.

Google is projected to have stored more than 10 exabytes of data on their servers, equivalent to 10,000,000 terabytes of data. This data includes current and historical map data, cached pages, Gmail, YouTube, search data, advertising data and media, Google Drive, and more.

How does the company store so much data and still process everything with so much speed?

Google uses Google Colossus, a distributed file system that allows Google to store exabytes of data in an efficient manner.

Google Colossus Explained

The Colossus Client Library

To perform operations on Colossus, like retrieve a file, you need to talk to Colossus’ client library.

The client library is like an SDK. It provides APIs and methods to call in order to perform operations on the data you want.

The client library is a layer over all of Colossus, abstracting away the hard work behind the scenes.

Curators

Let’s say you want to create a file in Colossus. So you use the client library to call a createFile(params) method.

The client library talks to Curators if they want to create a file (or perform any other file metadata operations, like file statistics or deletion). Curators store file system metadata in BigTable (Google’s scalable NoSQL database). BigTable (real creative name) allows Colossus to scale over 100x larger than Google’s old distributed file system (GFS).

If a Curator receives a createFile command, they will create the metadata for the file in the right place.

D File Servers

However, if you want to read or write to a file, this is a data operation. In this case, the client library will communicate directly with the D File Servers.

The D File Servers are the network-attached disks that store data.

If the client library wants to read a file, they don’t talk to the Curators at all. This is to reduce the amount of coordination and “network hops” needed. Instead, the client library retrieves the file directly from the D File Servers.

The D File Servers are maintained by Custodians. Custodians are background storage managers that maintain the durability and availability of data while also handling efficiency tasks for the disks overall (like RAID reconstruction and disk space balancing).

Putting it all together: Google Colossus Architecture

So, putting it all together, Google Colossus looks like this:

The Advantages of Colossus

Colossus’s main advantage is the fact that, from the client library, it seems like you are accessing your own isolated file system. In reality, behind the scenes, all the clusters and disks within are not sorted at all.

Simplicity by Abstraction

This simplicity shines in speeding up development internally. No software engineer wants nor has to wrangle with disk space, efficiency, or hardware.

If an engineer on YouTube wants to spin up a data space for all YouTube videos they can just tell the client library to create a file system called youtube-videos .

Then, they can add, read, or delete YouTube videos from the youtube-videos filesystem with ease. All the hardware and scaling is abstracted away.

A single cluster is scalable to exabytes of storage and tens of thousands of machines. Source

For example, at Google scale, there are machines, disks, and hard drives failing every minute, just due to the way failure rates for hardware works at scale.

Walking into a Big Tech datacenter.

Colossus knows this and works around it by providing fast background recovery.

Hardware Efficiency

Colossus provides hyper efficient storage thanks to some hardware efficiency magic.

Data can be accessed in different ways depending on the workload. Some jobs need fast accesses, like a dashboard that shows real time data. Other jobs are slower, like a pipeline that reads data once a day to create a daily report.

Data that is accessed often is served on flash memory for efficient serving and low latency.

All other data is put on disks.

Google has been doing this long enough that they are able to predict exactly how much flash they need and buy exactly that much for their data centers.

To not waste any disk space, Colossus has intelligent disk management to get as much value as possible from available disks. Disks are kept full and busy at all times.

As data gets older and is accessed less often, data is moved to larger capacity, slower-to-access drives, while faster disks and flash memory hold “hotter” data.

Colossus’s predecessor

Colossus came after GFS (Google File System), which was a landmark distributed systems paper released by Google in 2003.

GFS is the inspiration of a lot of systems today, like Apache Hadoop’s HDFS (Hadoop Distributed File System).

However, GFS had some cons that Colossus solved, mainly the fact that GFS didn’t have predictable performance, had bottlenecks for metadata operations, and couldn’t scale while maintaining speed.

Colossus isn’t perfect either. It probably isn’t the most efficient storage solution for lots of really small files and Google has various different storage solutions to handle more niche use cases.

Fun fact: In 2003, researchers estimated that all the words spoken by human beings since inception would equal just five exabytes (1 exabyte = 1 million terabytes = 1 billion gigabytes). Google has 4 times this amount.

