blog

Radically Cross Platform: Serialization

(This is the third in the “Radically Cross Platform” series of posts; see previous posts about Xamarin and Memory Management.)
When in the early stages of developing my cross platform game/graphical app engine, my first task was reading as much as I could find from internet articles, blog posts, and forum discussions. I wanted to find stuff written by people who had been there before me. I wanted to know what worked, and what didn’t; what kinds of pitfalls to avoid. One surprising piece of advice I found was to use binary serialization to optimize a mobile app’s speed and memory footprint.
In the desktop world, I’m accustomed to dealing with text formats ranging from JSON (my preferred) to XML (also common of course), not to mention the occasional INI, CDF, or even YAML.

Human Readable

What all the above formats have in common is that they’re text based and human readable. In fact, a careful person can open them in a text editor, scan through and verify that they’re correct, and certainly glean their meaning easily from their contents.
This human readability of text based formats is a great thing for developer productivity, and you would need a really good reason to ditch it. Unfortunately, the slowness of mobile CPUs and limited memory constraints provide just such a reason. So how can we use fast, tiny binary serialization of our objects without hurting our productivity?

Use JSON for Design Files

One of the really freeing realizations about software development is that you can have hundreds of megabytes or even a few gigabytes of source code and design media for your app, that gets boiled down to the 20 MB or so binary deployed to the App Store. So why can’t you have a human readable source format for your serialized objects, that gets compiled to the binary format for distribution? Well, you can of course.
My engine uses JSON for the source files, which are stored in separate files on the disk for easy perusal, split up at the most granular points that make sense. I use my custom design GUI to output and edit these files, but I can also edit the files in any text editor if there’s ever a reason to.
When I’m ready to generate the compiled resources, my designer spins through and builds the binary serialized versions of all the verbose textual data I’ve been manipulating at design time. It’s a little bit of work to set this up, but this way I have the best of both worlds.

Investigate Prewritten Tools

Google’s Protocol Buffers or “protobuf” standard carries a really good reputation as a binary format, and some of my colleagues have had good experiences with it. Here are some interesting benchmarks focused on C# binary serialization tools and highlighting protobuf performance.
For my own purposes I chose to just “bite the bullet” and implement the binary serialization myself. This gave me the most flexibility with my own memory management, obfuscation techniques, and handling of weird special cases. Besides: it’s not really that hard; it has the advantage of keeping an easily readable implementation in my source; and most importantly, it’s blazing fast.

Go Ahead and Be Strict

It’s true – binary [de]serialization is hard to debug. The best thing you can do is be strict and use “divide and conquer” techniques to narrow down your focus to find the bug. Use lots of checked lengths — for example, store the expected length of a complex binary object, then check if you’ve advanced exactly as far as expected in the stream when you’re done reading it. The stream cursor should be at the original offset plus the known length after your code is finished deserializing. This will almost always ensure that where your program crashes or stops on an assertion is in the vicinity of the offending code.
Another basic sanity check you can do is add assertions to catch absurd values. For example, if you deserialize an integer that’s supposed to be the length of the following object, then you know that either a negative value or a value greater than (let’s say) 10 million must be an error. Somehow it seems better to raise your own assertion than to wait for the OS to notify you that you can’t allocate 2 GB of ram in one chunk on your iPhone.
Automated tests are really helpful to ensure that all of your serialized data gets appropriately round tripped and run through its paces for verification.

Don’t Waste Memory

Just because you’re deploying a lean binary version of your objects doesn’t mean the whole implementation is memory efficient. Pay attention to how you deserialize: are you allocating byte arrays unnecessarily? Are there any other memory management patterns in your code that are creating a problem, such as memory leaks, or a failure to use object pooling to avoid garbage generation/collection?

Don’t Open Too Many Files

I mentioned above that I store my design files in JSON format at a fairly granular level. So for a particular app, I might have on the order of a few hundred separate design files destined to be serialized into the proprietary binary format for deployment in the app.
That’s great for the design stage, but it would be a mistake to have a one-to-one relationship between these hundreds of design files and distinct files on disk published with the app. There is just too much overhead for a mobile app in opening a file from the SD card or other storage. A more appropriate solution would be to concatenate all the binary files, and separately store an index of offsets and lengths. In my own engine I created a flexible container file format supporting lookup by name, GZIP compression, de-duplication, grouping of resources by language clusters and so forth. My GUI design tool is able to generate this as a single file containing all binary content for an app, optimized for the selected target platform. But those extras are far beyond the scope of this post — it’s sufficient to note here that opening too many separate files will slow down your app unacceptably.

Digital Rights Management

If you publish a popular app, you can be sure that a smart person with nothing better to do will try to crack it. Sites like Cydia and any number of Android equivalents attest to the success of app crackers. The good news is merely using binary serialization helps a little bit by obscuring your data, and also gives you a few more tricks for obfuscating your app.
One easy additional technique you can apply is to encode your strings somehow other than ASCII or UTF8. You don’t have to use full blown encryption, and I really wouldn’t suggest rot13 either, but just use your imagination. There are many such little steps you can take to make it difficult for your resources to be reverse engineered. I would venture to say that very few people would have any interest in reverse engineering the monolithic blob generated by my game designer, without access to my source code and without any recognizable strings in the blob.
(Just to be clear, all the obfuscation described here happens in the binary encoding process, and doesn’t affect the JSON design files, which we still want to be easily readable and editable as the master source for all of it.)

Closing Thoughts

There are so many separate components to create for a generic graphical app/game engine; I’m convinced that game programming is a microcosm of the wider world of computer science in general. No doubt serialization is one of the less glamorous aspects and feels like plumbing. But it is nonetheless important to get right in order to optimize your cross platform app performance across a wide variety of hardware levels.

Spot the Vulnerability: Loops and Terminating Conditions

by Adam Singleton | Jan 7, 2022 | Developer Blog, Home Display

Spot the Vulnerability: Loops and Terminating Conditions In memory-unsafe languages like C, special care must be taken when copying untrusted data, particularly when copying it to another buffer. In this post, we\'ll spot and mitigate a past vulnerability in Linux\'s...

Accurate Timing

by Adam Singleton | Sep 24, 2021 | Developer Blog, Home Display

In many tasks we need to do something at given intervals of time. The most obvious ways may not give you the best results. Time? Meh. The most basic tasks that don't have what you might call CPU-scale time requirements can be handled with the usual language and...

Exploring Dependent Types in Idris

by Adam Singleton | Aug 27, 2021 | Developer Blog, Home Display

When I'm not coding the "impossible" at Art+Logic, I take a lot of interest in new programming technologies and paradigms; even if they're not yet viable for use in production, there can often be takeaways for improving your everyday code. My current...

« Older Entries

Radically Cross Platform: Serialization

Human Readable

Use JSON for Design Files

Investigate Prewritten Tools

Go Ahead and Be Strict

Don’t Waste Memory

Don’t Open Too Many Files

Digital Rights Management

Closing Thoughts

Recent posts

Categories

Spot the Vulnerability: Loops and Terminating Conditions

Accurate Timing

Exploring Dependent Types in Idris

Services

Our Content

Innovation Zones

Get in Touch