Simplicity and Transparency
Posted by Patrick on Sunday, January 23, 2011Sun, Jan 23, 2011

Swivel had a particularly nasty bug. It crashed while loading my application state, but only occasionally—sometimes once a day, sometimes every hour. This turned out to be one tough error to find.

I spent many hours going over my code with a fine-toothed comb, running memory debuggers, rewriting and refactoring anything suspect or complex, throwing in extra validations and checks. While my code quality improved, the bug remained elusive.

But this is not an article about the bug.

This is about overcoming one of the major hindrances in finding that bug.

Serialization is a hairy problem in C++. I looked at several ideas for solving it before I finally settled on MessagePack.

Unfortunately, it subtly introduced complexity and opacity into my code.

These are two warning signs a programmer should be on the look out for. They are important code smells that could cause you a lot of pain down the road. Things may seem wonderful while it works—but when things break, you pay dearly.

MessagePack is a very powerful library. It calls itself an “extremely efficient object serialization library for cross-language communication.” I liked the relative ease with which it could hook into my C++ heavy object-oriented codebase. It seemed elegant.

But it is far more than I need.

I don't really need “extremely efficient” in this context. I’m writing an iPhone app with a few dozen objects, not enterprise grade software. And I certainly don't need “cross-language communication.”

I need “object serialization” and I need it stupid simple.

I do a lot of work in dynamic languages where serialization is usually as easy as taking the root of your object graph and passing it to some call that neatly packages it up and spits out JSON, XML, YAML or some other format that can later be turned right back into an object graph with a single call. Dynamic languages tend to be good at this sort of thing because of the ability to introspect.

C++, eh… not so much.

So this is my holy grail: the one-line call on your object graph root.

Stupid simple serialization.

A quick disclaimer before someone points me at this excellent post about data oriented design: my project was already object oriented and the design is simple. I like the idea of letting data drive design, but much of the benefit there is speed, although it can also make your code a little easier to reason about. If that’s your thing, more power to you.

Swivel, though, is a fairly simple project with a relatively small and shallow object graph and it has high performance. It isn't hurting for optimization.

And I happen to like my object oriented design.

So let's say, for the sake of argument, you already have an object oriented C++ project and you just want to save and load stuff real easy like. If you look at the various libraries out there—well, nothing feels quite like that one-liner.

In fact, in some cases you could be writing more serialization code than there is code in your class.

Every line of code you write introduces the opportunity for error.

MessagePack, while fairly elegant, did force me to write more code than I wanted. Most of my objects had to have Save/Load methods defined to handle things like enums and serialization of the superclass, or handling more complex members and once one class has a virtual Save method it propagates through your whole hierarchy.

Ugh.

Simplicity is important. Serialization (or any given feature) should not clutter your codebase.

The other problem with MessagePack was opacity. It spits out a tightly packed binary blob that is nigh impenetrable. I can rely on my debugger to tell me at which point in the deserialization process bad things are happening, but I have no context. All I know is, at this exact point the bit of binary data the library crashes because what it found is not quite what it expects to find.

Things would be so much easier if MessagePack could optionally produce and consume, say, JSON or XML. That would make things transparent.

That was strike two.

I need a library that:

  1. Is dead simple to use.
  2. Does not clutter my code.
  3. Serializes my stuff.
  4. Has a format I can inspect.

In my next post I will present a little library called Archivist.

Archivist is my attempt to solve this problem. It is stupid simple serialization, but under the hood it is very powerful.

There are more features I'd like to add, but it is functional and in use in Swivel, which has just been submitted to the App Store for review, to give you an idea of my confidence in it.

Archivist works swimmingly. It has allowed me to simplify my project and literally delete large swaths of code in many of my classes. It is thoroughly unit tested. It requires no inheritance, and it feels very pleasant to use.

As for that elusive bug, well, Archivist simplified my code to the point where not one but two interesting and non-intuitive bugs became obvious.