Protocol Buffers (Protobufs) were created by Google in 2001 because it needed an efficient way to store structured data.

Today, they’re used by a variety of tech companies, like Apple and LinkedIn, because of the multitude of benefits they have over a format like JSON. (Though JSON is still used in various cases within all companies.)

What are Protocol Buffers?

Protocol Buffers are a compact binary format that allows for efficient serialization of structured data.

The process of using Protobuf involves defining data structures (called messages) and services in a .proto file and then compiling this into code using the Protocol Buffer compiler. The generated code can then be used by your application (by importing) to read or write data in the structured way defined in your .proto file.

The way I first thought of Protobufs was “JSON with superpowers”.

JSON only goes between a string and a parsed object.

A Protocol Buffer goes between an efficient, compact binary and a full-featured, optimized class with a comprehensive, well-thought out API. Seriously, the API is really great and covers most practical use cases.

The proto file for serializing a Food object might look like:

syntax = “proto2” // Each field has a name, type, and assigned number. message Food { string name = 1; int32 calories = 2; optional string serving_size = 3; }

Each field’s number must be unique among all fields in the message. Also, field numbers 19,000 to 19,999 are reserved for the Protobuf implementation.

Protobufs support a huge variety of types, like booleans, strings, arrays, maps, enums, and more. The schema can also be updated later without breaking deployed programs that were compiled against the older formats, leading to some beautiful backwards compatibility.

Why Protobufs?

1. Efficiency: Protobufs are compact and 3 to 10 times smaller than XML or JSON. They are faster to serialize and deserialize which contributes to their high performance. For example, comparing JSON and Protobuf sizes directly shows that Protobufs win out in every single message in terms of smaller size.

2. Broad language and platform support: The Protocol Buffers compiler supports several languages, like C++, Java, Python, Go, C#, Ruby, Objective-C, JavaScript, Swift, and PHP, to name a few.

3. Backwards and Forward compatibility: Protobuf allows for changes to your data structure without breaking older programs that are compiled against the "old" format.

4. Type Safety: Working with Protocol Buffers generally means working with less code and fewer chances for bugs to appear. You only focus on the data structure, while the Protobuf compiler takes care of the rest. The compiler automatically generates classes which leads to optimized functionality.

5. Standardized API: Protobufs have a standardized, useful API with many features. This is thanks to Protobufs being strongly typed.

6. Sounds cooler: It’s important for programmers to be clear when communicating, but also sound as smart as possible. Protobufs sound cooler than “jay-sawn”. Who is Jason?

For example, to get the number of calories from Food above, and assuming I have a Food object named apple , I would just do apple.calories() to get the number of calories.

apple.serving_size() = 0.95 because who took a bite out of my apple, dammit!

My favorite part about the Protocol Buffer API are descriptors, which allow you to get the actual fields by both name and number of a Protobuf message. The Descriptor API is extremely detailed and in my opinion is what really set Protobufs apart for me.

Usage of Protobufs in the Wild

Google uses Protobufs to perform efficient communication between their systems to handle billions of RPCs per second, due to its efficient serialization. Imagine if your Google search didn’t load in .5 seconds. Heresy!

Apple uses Protobufs for a lot of their applications too, like Apple Notes.

LinkedIn switched from JSON to Protobufs, leading to up to a 60% improvement in latency for services with large payloads. They saw 0 degradations compared to JSON.

Pretty great, right? Now you can waste time scrolling through hundreds of self-congratulatory posts even faster! (I’m lying to myself. My LinkedIn feed is actually… somewhat enjoyable now 😳.)

Since Protobufs are typed and provide a comprehensive standard API across all languages, it means that functionality is the same across the company when it comes to Protobufs. Basically, if I switch teams, I’m familiar with the base data format and API already, removing an “onboarding step”.

Arguments against Protocol Buffers

A lot of design choices in Protocol Buffers are related to the scale and complexity that Google operates at. As a result, Protobufs can sometimes feel unclear due to certain constraints.

For example, oneof fields can’t be repeated and map fields also cannot be repeated.

For scalar fields, they are are always present even if unset. They are 0-initialized, which, if overlooked, can lead to issues in codebases. This can make it impossible to differentiate if a Protocol Buffer message field is missing or just had a default value assigned to it. This, in my opinion, is one of the most annoying shortcomings of Protocol Buffers, though it can be worked around in a pretty reasonable manner.

Regardless, every single technology has shortcomings because every single technology is designed with tradeoffs, compromises, and competing constraints in mind.

If you know of a perfect technology, please comment below. I would like to see if you’re able to comment on Substack even while you’re dreaming.