Stack only JSON deserialization

Few days ago I’ve released the StackOnlyJsonParser library. It’s essentially a wrapper around the System.Text.Json tokenizer that allows for allocation-free json deserialization. What do I mean by allocation free? That’s an important question considering that there are quite few libraries that claim to achieve the same goal, that are available on the market for quite some time already. And all of them are kind a hoax – including mine.

Strictly speaking, allocation free data deserialization is not possible. Even if we were to pre-allocate all necessary data structures, the control logic would still need to put some local variables on the stack to manage the process. You simply cannot eat a cookie and have a cookie. Either the stack or the heap has to grow – even if only by a small amount.

Is that a big issue? Not necessarily. It’s reasonable to assume that the program will require some memory in order to run. The question is if that memory is being used efficiently enough.

In case of most of the existing json libraries, when the allocation-free processing is mentioned, it’s usually understood as a very specific type of allocation-freeness. First of all it refers only to heap allocations (after all those are the expensive ones – unlike the stack allocations they are a bit more complex than just moving of a stack pointer, and they also awaken the old beast called GC). Secondly, it focuses only on allocations other than allocations needed to create the data structures that hold the result of the deserialization process.

Does it make sense? I would say it does. After all, the outcome of the deserialization process has to be handed out to the user for later processing. The user might want to store it, make some calculations based on it or feed it into some custom processing pipeline. So it’s completely normal that we assume those allocation to be required and just focus on eliminating the overhead allocations. A good example of those, in case of json formatted data, would be the parsing of property names. We don’t really need to construct fully fledged string values based on the byte representations. Instead, we can compare the byte data in place against the set of expected string literals. No allocations, pure arithmetic. And so, finding out and eliminating those excessive allocations is a very noble goal.

Nevertheless, I’ve decided to take it one step further and attempt to eliminate all heap allocations whatsoever.

First, lets take those excessive allocations of the tokenization process out of the way. The truth is that I didn’t have to move a finger in order to get it done. It was simply handed out to me in a form of the Utf8JsonReader type defined in the System.Text.Json assembly. It’s an extremely well designed, allocation free, modern and fast Json tokenizer. So, the work is done, right? Right!? The standard library provides a way of deserializing json data completely on stack. What else would we need?

Well, the problem is that the Utf8JsonReader is just a tokenizer. With few extreme exceptions, having to manually parse the incoming data this way might be a bit time consuming. Especially if the data model is complex. And I’m not talking about the time in a context of performance of the resulting algorithms, but the time needed to develop such solutions. After all, as developers we are quite lazy and want to write as little code as possible to achieve our goals.

Preferably, the code defined by a user of the library should be limited only to the description of the data model itself. The deserialization code should be hidden away.

So let’s jump right into it.

As we are talking about C#, first thing we have to decide is how do we want to store the deserialized data. One option would be to use classes. We would instantiate a proper number of objects that would be later reused by incorporating some sort of object pools. Sounds like a solution, but it has few downsides. First of all, even though we’ve eliminated the allocations from the deserialization process, we didn’t limit the actual memory usage requirements of the library. Quite on the contrary, now it will constantly use at least the amount of memory required to store the biggest object that was ever deserialized. Another tricky part is with the reusability of objects. One has to be very careful not to override the state that is yet to be read. There are essentially two options here. Either to ask the user to manually return objects that are no longer in use, or to assume that the new deserialization process always takes place only after the previous batch of data has been fully consumed. Both solutions are quite error prone. Either the user will forget to return the object to the pool losing the benefits of reduced heap allocations, or will persist its state to later figure out that it has been unexpectedly altered.

So if not classes, than our only other option are structs. And surely enough they eliminate the problems mentioned before. We can easily store them on the stack, rendering the object pools unnecessary, and limit the risks associated with persisting of reusable objects. Are they perfect then? Nope, they bring problems of their own. They can be quite expensive to copy if someone forgets to pass them using the “in” parameter. And that’s just the beginning. The reason for an even greater problem will be revealed once we start talking about collections. So sit tight. I will just sneak peek that instead of using standard structs, we will be using ref structs. This essentially means that we won’t be able to persist the deserialized state at all. We will be forced to consume it right away either by making some calculations based on that data or by copying the data from the deserialized state to some manually pre-allocated heap memory. So why do I still think it’s a better solution? Because those limitations strictly ensure that the library is going to be used as intended. With classes there is a high risk that a user will forget to return an object to a pool or override the state still used elsewhere. And even though those mistakes would be easy to make, they are very difficult to track down. With ref structs, on the contrary, it’s impossible to make those mistakes as the language itself prohibits that. Just persisting of the state of deserialized objects becomes more verbose, but I think it’s a small price to pay. And what about expensive copying of structs by value? Well, if done wrongly it can indeed impact the performance, but not the correctness of the code (btw, I hope that one day the C# team will add a way of disabling the “default copy constructors” for selected types).

With that out of the way let’s focus on the deserialization process itself. For simple objects, and by simple I mean objects that consist only of fields of basic types, it’s easy enough. Simply use the tokenizer to iterate through the provided data and fill in all of the fields within the structure. Well, even if one of those fields is of a non-basic type we can do the same thing, simply passing a reference to the tokenizer a level deeper and continuing the work from there. With one exception. Collections. As collections are quite common it’s not a small exception that we can simply disregard. So how can we even attempt to deserialize collections on the stack?

In C# 7.2 a new keyword has been introduced: stackalloc. No, just kidding, we won’t be using the stackalloc for that. Instead we will befriend an old and beloved technique of lazy loading. Whenever we encounter a collection we will store the current position of the tokenizer for later, skip the current block and continue the processing starting from there. This inconspicuous decision has some far reaching consequences. Let’s talk about them.

First let’s look at the advantages. Once the user decides to iterate over a collection, we can deserialize its elements one at the time. This means that there is never more than one object of a given type allocated on the stack at one given time and the memory complexity of the deserialization process is equal to O(1), not matter how many elements are serialized in a given collection. Just the thing we wanted! We also always deserialize the elements of the collection into the same spot in the memory, which can greatly improve the performance because of the process being cache friendly.

And what about the cons? Well, there are quite few of them. We don’t get the “random access iterator” – we are limited to only iterating through the list with no ability to retrieve an arbitrary element. We can look at only one element at the time. The type that stores the collection has to be a ref struct, making it impossible to persist it. Wait, what? How is it implied? That’s simple. In order to support lazy loading of a collection, the containing type has to store a snapshot of the tokenizer that points to its beginning. That state includes a pointer to the byte array in a form of a ReadOnlySpan<byte> and those can ONLY be stored on the stack. And therefore, the deserialized type also needs to be marked as a ref struct. So that’s that. The last major disadvantage of lazy loading is its impact on the performance.

Lazy loading of data is not necessarily slower than regular loading on its own, but unfortunately the json format doesn’t help here. Whenever a collection is encountered it has to be skipped. We cannot simply hand out the half-baked state to the user as we don’t know in what order he will want to process the fields of the provided object. If we deal with data like this: {A: 1, B:[2,3], C:4}, it might be that the user will want to access the field C before iterating over the collection B. So we have to skip the collection B in order to retrieve the value C before we hand out the state. And the process of skipping fields is quite expensive. It essentially requires us to iterate through all characters within the skipped part, counting the current nesting level, to finally arrive at the closing tag. The process that will have to be repeated once again once it comes to actually iterating over that collection. That’s why, even though the stack only deserialization has a very nice memory complexity, it might be slower than more traditional approaches when it comes to the actual performance (at least when the data set is small enough, so that the memory locality doesn’t play a big role yet).

As a side note It’s worth mentioning that it’s a shortcoming of the json format. In the past I wrote a similar tool for stack only deserialization of proto messages and the performance was greatly in favor of using the stack-only approach over heap deserialization. Why? In proto files all variable-length entities are preceded with a number representing their length in bytes. That makes the process of skipping them so much faster.

But back to the topic. Let’s turn our eyes in the direction of the last allocating part of the code: strings.

First let’s talk about the solution and then I will explain why we don’t actually need it.

The solution is to create a special StackOnlyJsonString type the users will be able to use instead of the System.String. Once again, it’s just a wrapper around the Utf8JsonReader struct. It provides a way of comparing the stored string data with a provided string value by doing a simple byte wise comparison. Unfortunately constructing those objects is quite expensive as the Utf8JsonReader is a relatively big struct and copying of its state is not free. To be honest I’ve created the StackOnlyJsonString mostly to prove that this library can be used for truly stack-only deserialization but I don’t actually recommend using it.

As I’ve mentioned, the StackOnlyJsonString is not needed. At least not in case of stack only deserialization. All string values are relatively short lived in this kind of processing. It’s different with heap deserialization. There deserialized strings will live at least as long as the collection that stores them is alive. They will most likely be removed only after the last piece of the deserialized data is fully processed. In case of stack only deserialization it’s different. References to those strings are (in most cases) held only for the duration of the life of an object storing it – and with lazy loading that’s really short. They get created and die right after that, leaving no trace of their existence. Their scope is limited to a single iteration in the collection loading process. So even though the  StackOnlyJsonString is a cool optimization, it’s more of a gimmick than an actual improvement. You can clearly see it by comparing those two graphs representing different aspects of the memory usage (the StackOnlyJsonParser is used with both the System.String and the StackOnlyJsonString as the underlying string representation):

Even though the difference in the total number of allocated bytes is big, the difference in number of bytes used at any given moment is rather insignificant.

And now we are coming to an end. So far we’ve been talking about data models and the deserialization process. But one question still remains. How to create a library out of that? Yup, that’s a major one.

One of the solutions would be to use some reflection-based magic tricks. To be honest I didn’t want to go that way. I’ve decided to use an even bigger magic of automatic code generation.

In the past I wrote some C# code generators in python. Back then it was the simples way to integrate code generation into the build process of other projects. But those times are no more. With C# 9 we will get a new and shiny toy called source generators that I’ve decided to take for a test ride. Ekhm, ok, who am I kidding? Source generators were the main reason why I’ve decided to create this library in the first place…

The process is simple: the source generator find all types within a project that are marked with a specific attribute and enrich them with special constructors later used for the deserialization. A very cool feature. Will definitely use it more in some later projects.

And yeah, that seems to be it. The deserialization process fully contained on the stack with (optionally) no heap allocations. Something that would be hard to achieve few years ago is now implemented in just few hundred lines of code. And that’s all thanks to the major improvements we’ve been seeing recently in C# and .NET. Let’s hope we keep that momentum!

Thanks for reading!

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *