Text sourcing

So, Text sourcing.

I’ve started working on this project because I didn’t want my React.js to become too rusty. Lately I don’t get that many occasions to write js, so this seemed as a perfect way of polishing it. As it’s just a pet project I wasn’t even sure if I would bring myself to finish it. But well – I persevered – and in fact I had much fun creating this piece of software. I also believe that there are some parts to this project that are worth sharing. So fasten your seat belts and let’s dive into it.

First thing first. We should start with few word on what this project is all about. Text sourcing (or Time Writer as this is how I’ve initially named the repo and those things like to stick) is a project focused on creating an online collaborative text editor. Text editor that can be used by multiple people at the same time to edit the same document, even the same line of text. Of course the idea isn’t new. There are multiple solutions like that available on the market, which you and I are probably using on daily basis (like Google docs or MS Office online). In the depths of the internet one can also find projects like mine, created just for fun or by some researchers. I can’t claim than that my approach is very innovative nor that the way in which I’ve implemented it is the best from all of the possible alternatives. It’s for you to decide. What we can do for sure, is to treat this project as a case study of a broader problem. Problem that we will discuss today.

Just to have some reference, a working example of the app I’m about to describe can be found here: https://text-sourcing.tomasz-rewak.com/. It’s an open source project, so if you want to check out how all of this was implemented, everything is on my github repo.

So, latency. The single most destructive power in the roam of collaborative applications. How can we ensure that two actions taken by two users on two different machines located in two different countries will result in both users seeing the same outcome being a combination of those two operations? Maybe a more specific example is in place – an example that would touch on the topic of text editors. Lets say that a user A inserts, at the end of a document, a letter ‘a’. Few milliseconds later a user B inserts a letter ‘b’ at the same position without knowing that someone else was faster then him. Internet connection is simply not fast enough to propagate the change made by user A in time. We probably can all agree that the outcome of combining those two operations should be a document ending with these two letters: ‘ab’, as this is the actual order in which they were inserted. Unfortunately, we are not working on a single CPU here. We need a way to synchronize states of both documents over the wire. But it’s not a simple task. For example, if we were to send the entire content of the document from the user A to the user B after the first change is made how can the user B decide on how to merge those two text files? What should the user B send to the user A as a response? Not mentioning that sending the whole text file after each change would result in some serious bandwidth issues.

I’ve already seen a (vary popular) software that would solve this problem in the most naive way. It would just lock the whole paragraph for the time it’s edited by one of the users so that others cannot change it. Simple, robust, perfect. It only has to synchronize one short block text and only by broadcasting it for one user toward others as a read only blob. Perfect for enterprise solutions. But together let’s aim for something more, as this approach doesn’t solve the problem from out initial example of two users appending some text at the end of the file. It just makes it an invalid operation.

And that’s where event sourcing comes into play (CQRS fans, please calm your horses for now, we are just going to take a look here at the basics of event management, not the overall software architecture). If it’s a case that you see this term for the first time, don’t worry. It’s a very simple idea, though a very powerful one at the same time. The most basic definition that I can come up with is that based on an initial state of an object (like a text document) and a list of all operations performed on that object since creation we should be able to calculate its current state (under some conditions of course). The most important thing about those operations (events) is that if we have two of them we need a reliable way of determining which was performed first (for example by using a timestamp). In other words we have to be able to sort them. And that’s it. Initial state + ordered list of events = current state

So then, it’s the solution we’ve been looking for! All we have to do is to broadcast all of the events between all of the users. Then those users will sort those events locally and apply them on the initial state (an empty document)! Or is it not our solution? Well, maybe not yet, but indeed it’s the core idea behind it. There are still some questions to be answered. Most importantly: How should we define the state (what should it be composed of)? How to ensure that events can transform this state correctly even if their author doesn’t know what is the exact (global) state at the moment of events creation (I will explain what I mean by that in a moment)? Another thing is how to sort those events, so that all of the clients are looking at the same data (and no one is favored). Finally we also have to address some performance issues. Applying all of the events available in the entire history of a document, starting from a blank page, each time someone wants to make the simplest change, doesn’t seem as a great way of achieving high UI responsiveness.

Lets start with a problem of how to model the state. From now on let’s dive a little bit deeper into the details. My first thought, when I was initially faced with this problem, was to model the state as the content of a file. Simply: a text. It’s probably the most natural way of thinking about it, but unfortunately a little bit crooked. The problem lays here: how would we model events in such a system? Probably they would have to store an information about a position in the text they are to affect (like where to insert a new text fragment). Then it seems reasonable to assume that two consecutive events sent by two users at the same time might look like this:

[
	{
		"type": "insert",
		"position": 8,
		"text": "am ",
		"author": "A"
	},
	{
		"type": "insert",
		"position": 13,
		"text": "father.",
		"author": "B"
	}
]

And the text document would look like this:

Text:     | Luke, I your
----------| ^       ^    ^
Positions | 0       8    13

The intention here should be clear. The final text should say:

Text:     | Luke, I am your father.
----------| ^       ^    ^
Positions | 0       8    13

But if we attempt to apply those to events in the order that is specified above we are going to get a different result. Let’s see. First we insert the “am” fragment before the eighth character so we get this:

Text:     | Luke, I am your
----------| ^       ^    ^
Positions | 0       8    13

And now the “father” part:

Text:     | Luke, I am yofather.ur
----------| ^       ^    ^
Positions | 0       8    13

Crap. It doesn’t seem right. And problems like this become more and more severe as we try to add more functionalities. Just imagine handling text selection (with replace/delete operations) in such environment. We would be constantly at risk that another user would try to insert some text in the previous section of the document which would lead to us editing a fragment that we didn’t intent to edit in the first place. And what about undo/redo operations? It’s just too error prone. It’s also worth noting that order in which events are received is very important this case. If we were to perform the “father” event fist and then the “am” event we would get a correct string. If we are not aware of that, debugging such application might be a nightmare.

Can we somehow design our system, so that situations like this are handled correctly? Yest, it’s possible! We just have to redesign our state. The trick here will be to extend it with caret’s definition. That’s right, each user will store list of carets as a part of the state. And not only his/her carets but also carets of all the other users as well. For now lets assume that the caret definition consists only of two pieces of information: its position and its owner id.

So let’s go back to our example. Now the initial state would look more like this:

Text:     | Luke, I your
Carets:   |         A    B
----------| ^       ^    ^
Positions | 0       8    13

Events no longer need an information on the position of insertion. They can be simplified:

[
	{
		"type": "insert",
		"text": "am ",
		"author": "A"
	},
	{
		"type": "insert",
		"text": "father.",
		"author": "B"
	}
]

Whenever an event is applied on a state, it not only alters the text part of it but also changes the position of carets accordingly. To find a position of insertion the author of the event has to be matched with the owner of the caret. Lets take a look at what’s the outcome of the first operation in a system designed this way:

Text:     | Luke, I am your
Carets:   |            A    B
----------| ^       ^  ^    ^ 
Positions | 0       8  11   16

All of the carets after the position where the new fragment was inserted were moved automatically. They simply follow the text. If we were to implement this state transformation function in java script it would look something like this:

function reduceState(state, event) {
	for (let i = 0; i < state.carets; i++) {
		if (caret.owner !== event.author)
			continue;

		const position = state.carets[i].position;

		const firstTextPart = state.text.substring(0, position);
		const secondTextPart = state.text.substring(position);

		state = {
			text: `${firstTextPart}${event.text}${secondTextPart}`,
			carets: state.carets.map(
				caret =>
					caret.position < position
						? caret
						: {
							...caret,
							position: caret.position + text.length
						})
		}
	}
}

(For now let’s not argue about cleanness of this code. I’ve created it just as an example as I wanted to fit everything in one function.)

As you can see it’s fairly simply and in a single threaded application we can perform this operation atomically. Also, if you look now at the state we’ve got after applying the first event it’s clear that it’s perfectly aligned for receiving the second one. Even though the autor of the second event didn’t know at time of event creation that the position of insertion would change. He didn’t need to. All he had to do is to say that he/she wants to insert some text at the current position of his/her carets (yes, spoiler alert, the code above works also for multi-caret editors).

So, more formally, how does the state of a document look like? For the initial state of the example above it would be something like this:

{
	"text": "Luke, I your",
	"carets": [
		{
			"owner": "A",
			"position": "8"
		},		
		{
			"owner": "B",
			"position": "13"
		}
	]
}

With this approach carets are easy to manage and modify. Also, multiple users can have their carets placed at the same position. Storing information of the position of all carets might also come in handy if we want to visually let our user know what others are doing and which text fragments they are editing.

Of course one cannot say that it’s a perfect solution. First of all: we still have to start by adding our caret at a specific location. For some it might seem as if we were just moving the the problem somewhere else. But that’s not entirely true. First of all, during text edition, we rarely click on the screen to change the position of a caret. More commonly it moves when we write or navigate with arrows. And these are the operations that we should optimize the flow for. And of course a caret creation is not a destructive operation. If we would end up adding it at a position that becomes invalid after some other events are received, we can always click again or (even better) move there with arrow keys. Text is not affected.

But if we think that it’s not enough we can always take this one step further. Something that I personally didn’t want to do, but it’s feasible. We could include in the caret creation message some kind of an information that would indicate what its author thought the current state was. We can then try to reproduce this state on other machines and attempt to deduce where the position specified by the author of that event would actually end up if the order of events was correct. But it’s too much work and ain’t nobody got time for that.

But I’ve mentioned an important thing that I’d like to go back to. Navigation with arrows. Is this somehow different from other events? No, it’s not. It doesn’t need any special treatment. We just have to use a different function for transforming the state. In this case the text part of the state will not be affected. Actually, only the author’s carets will have to be updated. The drawback here is that we have to store those events with all others. It might make the history of file changes quite long. Also reverting changes (which I will cover soon) will include reverting the position of the carets. Maybe it’s actually something that we want, but still we have to keep in mind that it might add some extra steps to the state generation process.

So what exactly happens when we receive a new event? First of all we have to take look at its timestamp. By comparing it with timestamps of other events in our history log we can insert it in a right place.

New event:
21:36 -> insert...

Stored history:
21:30 -> insert...
21:33 -> delete...
21:35 -> move caret...
21:38 -> insert...
21:39 -> insert...

Updated history:
21:30 -> insert...
21:33 -> delete...
21:35 -> move caret...
21:36 -> insert... << inserted state
21:38 -> insert...
21:39 -> insert...

Of course the real timestamps are measured with a precision of 1 millisecond. But it’s just an example. Also, in practice, distance between consecutive events is less the few hundred millis. For events further apart from each other then 1 second the risk of encountering an ordering conflict is minimal.

Apparently adding a new event to the queue in a quick operation. And how about calculating the current state? As I’ve mentioned before one way to do this is to apply all of the events (in given order) on the initial state of the document (by default an empty file). But it might be very time consuming. A simple optimization would be to remember a calculated state next to the event which was used to produce it. After inserting a new event all we would have to do is to recalculate few latest states. Not just a single state, as all states after the newly inserted event have to be invalidated.

New event:
21:36 -> insert...

Stored history:
21:30 -> insert...     + {state}
21:33 -> delete...     + {state}
21:35 -> move caret... + {state}
21:38 -> insert...     + {state}
21:39 -> insert...     + {state}

Updated history:
21:30 -> insert...     + {state}
21:33 -> delete...     + {state}
21:35 -> move caret... + {state}
21:36 -> insert...
21:38 -> insert...     + {invalid state}
21:39 -> insert...     + {invalid state}

Updated states:
21:30 -> insert...     + {state}
21:33 -> delete...     + {state}
21:35 -> move caret... + {state}
21:36 -> insert...     + {newly calculated state}
21:38 -> insert...     + {newly calculated state}
21:39 -> insert...     + {newly calculated state}

I know what you are thinking. It might seem a little bit wasteful. Do we really need to store a state next to each event? Well, not necessarily. For example we can store just one in ten consecutive states. Then, after inserting a new event in the middle of the queue we wouldn’t take the state from the previous event (as it might not be there), but from the first of the previous event that has a valid state. And beginning with it we can start applying following events. In the worst case scenario we will just have to perform few more operations, but we’ve gained on memory usage. We can also purge parts of the history that are further in the past more strictly, while increasing the density of states associated with most recent events. After all it’s more probable that for new calculations we will need to use a state from 10 milliseconds ago then a one from 2 hours ago.

And now to the more interesting part: how to send an event. After all we don’t want those documents to be read only, we want to make changes to them as well. This process is a little bit more complicated. It’s mostly due to the fact that I wanted to use an architecture with a single, centralized source of truth – the server. It allows new users to join easily by providing the current state and history. It also acts as a central hub by distributing changes made by some users to all others. Another purpose of the server here is to make sure that no one is cheating (that is: sending fake timestamps or events in the name of other users). And of course the server is also a place where our data can be easily persisted for later access. I am aware that creating a decentralized system that achieves all of this is possible as well, but I’ve decided on the simpler approach.

If the server should be responsible for assigning timestamps, how can a user that sends an event know when will it be received? Well, at first it has to take a wild guess. He might also try to assign the current time as the timestamp, but it is not recommended (more about this in a moment).

So, after adding an event to the local storage (with some fake timestamp) it can be sent to the server. Server then updates the author and the timestamp fields with proper values and sends it to all other connected users. In the meantime the initial creator receives a response confirming that the event was successfully processed and the new timestamp, which should replace the old one. What the user does next is he removes the previously inserted event from the history, invalidates states that follow it and inserts it once again as if it was just received from the server.

So the whole process would look something like this:

Stored history:
21:38 -> insert_1     + {state}
21:39 -> insert_2     + {state}

User update:
21:38 -> insert_1     + {state}
21:39 -> insert_2     + {state}
21:40 -> insert_3     - new event added by the user

Calculating state:
21:38 -> insert_1     + {state}
21:39 -> insert_2     + {state}
21:40 -> insert_3     + {state}

Server propagates some changes made by some other user:
21:38 -> insert_1     + {state}
21:39 -> insert_2     + {state}
21:40 -> insert_3     + {state}
21:41 -> insert_4     - new event added by some other user
21:43 -> insert_5     - new event added by some other user

Calculating state:
21:38 -> insert_1     + {state}
21:39 -> insert_2     + {state}
21:40 -> insert_3     + {state}
21:41 -> insert_4     + {state}
21:43 -> insert_5     + {state}

Server responds with the the real timestamp of user's event:
21:38 -> insert_1     + {state}
21:39 -> insert_2     + {state}
21:40 -> insert_3     + {state} <- real timestamp is: 21:42
21:41 -> insert_4     + {state}
21:43 -> insert_5     + {state}

Removing event and invalidating states:
21:38 -> insert_1     + {state}
21:39 -> insert_2     + {state}
21:41 -> insert_4     + {invalid state}
21:43 -> insert_5     + {invalid state}

Inserting the event back but with the new timesptamp:
21:38 -> insert_1     + {state}
21:39 -> insert_2     + {state}
21:41 -> insert_4
21:42 -> insert_3
21:43 -> insert_5

Recalculating states:
21:38 -> insert_1     + {state}
21:39 -> insert_2     + {state}
21:41 -> insert_4     + {state}
21:42 -> insert_3     + {state}
21:43 -> insert_5     + {state}

And that’s about it in the topic of sending our own events to the server. One thing has to be mentioned, though. The process presented above actually rarely takes place. Why? If we are the only person making changes to the document at a specific moment, then our events always end up at the top of the stack and they don’t need to be moved. Or at least they won’t have to be moved if we will try to predict the time when they will be received by the server. Otherwise we might end up in a situation like this (as real timestamps will be received in the same order as the order in which events were sent):

21:38 -> insert_1     << waiting for real timestamp
21:39 -> insert_2     << waiting for real timestamp
21:41 -> insert_4     << waiting for real timestamp
21:42 -> insert_3     << waiting for real timestamp

21:38 -> insert_1     << real timestamp is 21:43
21:39 -> insert_2     << waiting for real timestamp
21:41 -> insert_4     << waiting for real timestamp
21:42 -> insert_3     << waiting for real timestamp

[moving 1 event, recalculating 4 states]

21:39 -> insert_2     << waiting for real timestamp
21:41 -> insert_4     << waiting for real timestamp
21:42 -> insert_3     << waiting for real timestamp
21:43 -> insert_1

21:39 -> insert_2     << real timestamp is 21:44
21:41 -> insert_4     << waiting for real timestamp
21:42 -> insert_3     << waiting for real timestamp
21:43 -> insert_1   

[moving 1 event, recalculating 3 states]  

21:41 -> insert_4     << waiting for real timestamp
21:42 -> insert_3     << waiting for real timestamp
21:43 -> insert_1   
21:34 -> insert_2

[moving 1 event, recalculating 2 states]

21:41 -> insert_4     << real timestamp is 21:45
21:42 -> insert_3     << waiting for real timestamp
21:43 -> insert_1   
21:34 -> insert_2

...

And this is a real issue if we are writing fast enough. The good news is that our prediction about the real timestamp doesn’t have to be very sophisticated. In my case I’m just adding one second to the time of event creation. Received real timestamp is usually much smaller then that and the excessive shuffling no longer takes place:

21:44 -> insert_1     << waiting for real timestamp
21:45 -> insert_2     << waiting for real timestamp
21:46 -> insert_4     << waiting for real timestamp
21:47 -> insert_3     << waiting for real timestamp

21:44 -> insert_1     << real timestamp is 21:43
21:45 -> insert_2     << waiting for real timestamp
21:46 -> insert_4     << waiting for real timestamp
21:47 -> insert_3     << waiting for real timestamp

[moving 0 events, recalculating 0 states]

21:43 -> insert_1
21:45 -> insert_2     << waiting for real timestamp
21:46 -> insert_3     << waiting for real timestamp
21:47 -> insert_4     << waiting for real timestamp

21:43 -> insert_1     
21:45 -> insert_2     << real timestamp is 21:44
21:46 -> insert_3     << waiting for real timestamp
21:47 -> insert_4     << waiting for real timestamp

[moving 0 events, recalculating 0 states]  

21:43 -> insert_1     
21:44 -> insert_2 
21:46 -> insert_3     << waiting for real timestamp
21:37 -> insert_4     << waiting for real timestamp

[moving 0 evens, recalculating 0 states]

21:43 -> insert_1 
21:44 -> insert_2
21:46 -> insert_3     << real timestamp is 21:45
21:37 -> insert_4     << waiting for real timestamp

...

Another important optimization here (actually applied above) is recognizing the fact that the event not necessarily has to change its position after the timestamp is updated – even if it’s not at the top of the stack. The condition here is that the new timestamp still has to be smaller the the timestamp of the next event and greater then the timestamp of the previous event. If this condition is met, then we just have to update the timestamp of that event accordingly, but we don’t have to recalculate any states.

And by the way, why wouldn’t we just wait for the server response before adding this new event to the history in the first place? One again it’s all about latency. When we write we want to see the outcome right away, without any delay. We don’t want to wait for the message to go to the server and back before we can see the change (even if this event might get reordered afterwards).

The last important piece of functionality is support for undo/redo operations. I don’t want to get into too much details on this one, as it would require me to prepare some more advanced diagrams. But still, there are few things that are worth mentioning. Undo/redo operations are also transmitted and stored in the history as normal events. What’s different is how they are processed. They don’t transform the state of the previous event but rather point which event is the previous one. If, during state calculation, we encounter an undo event, we increment a counter of “standard” events to be skipped, and vice versa for redo event. And why don’t we just remove events that have been undone from the history? Well, because the user might always want to redo them. That’s one thing, but also the order in which we currently store our events might change, in which case we might end up removing a wrong event. So this is the way to implement it, even if we don’t want to support the redo operation. Apart from that, inserting undo/redo events into the history works the same as it does for standard ones (with all that timestamp updating and reshuffling) .

Some more trickery is needed when we want to enable users only to undo their own events. We need to store separate counters of events to be undone for each author while searching for the firs available state we can base the current calculation upon. But the core idea remains the same.

Just to wrap everything up I think I should show you some examples. Beneath you can find different events that can by generated in my system:

// Adding a caret at a position by clicking on the text
{
	"type": "manage-carets",
	"operation": "add-caret",
	"placement": "position",
	"preserveExisting": false,
	"position": 188,
	"timestamp": 1547838921154,
	"author": "23_PZfkk3vBVv0xLAAAD"
}

// Inserting some text at the current caret position
{
	"type": "insert",
	"text": "tomasz",
	"timestamp": 1547839141633,
	"author": "I6QXDUlG-3DkEK2KAAAG"
}

// Moving the caret one line up and at the same time selecting the text
{
	"type": "navigate",
	"mode": "move-vertically",
	"direction": "up",
	"select": true,
	"timestamp": 1547839205634,
	"author": "I6QXDUlG-3DkEK2KAAAG"
}

// Deleting selected text
{
	"type": "delete",
	"mode": "backward",
	"fast": false,
	"timestamp": 1547839302498
}

Even more interesting might be an example of a state. There are few detail that I’ve omitted during this brief introduction.

{
	carets: [
		{
			beginPosition: 107,
			endPosition: 116,
			lastOperation: "selection",
			owner: "I6QXDUlG-3DkEK2KAAAG"
		},
		{
			beginPosition: 52,
			endPosition: 52,
			lastOperation: "creation",
			owner: "reijJO3OF9YFEbl1AAAI"
		},
		{
			beginPosition: 78,
			endPosition: 78,
			lastOperation: "creation",
			owner: "reijJO3OF9YFEbl1AAAI"
		},
		{
			beginPosition: 287,
			endPosition: 301,
			lastOperation: "creation",
			owner: undefined
		}
	],
	text: "import React, {Component} from 'react';expo..."
}

Primarily, caret’s do not consist of just one position, but two. Begin and end. Thanks to this we can model selection of a text fragment. If both positions are equal we are dealing with a standard caret. But from the code perspective it actually doesn’t matter. During processing, those positions are handled separately. Thanks to this, for example if someone will write in the middle of the text fragment we’ve selected, our selection will be expanded automatically.

Another thing here is that one of the carets has an undefined owner. It’s the user from which I’ve copied this state. He doesn’t need to know his id, the carets with undefined owner simply belong to him.

And that’s about it when it comes event sourcing used in this project. If you have any questions just feel free to ask.

Just to fill some blank spots: for sending events back and forth I’m using web socket connections. The .js framework I’ve picked for the UI part is React.js, as it’s concept of a single source of truth works perfectly with event sourcing principles.

I hope that this short report was useful to some and entertaining to others. It’s good to remember that techniques like this are not only limited to simple use cases like the one presented, but can be also used in video games and other more demanding environments. Now I can only encourage you to check out the repo, and see for yourself how this project works. Especially scripts in this directory should be worth reading, as they include the logic of state transformations.

Thanks for reading and see you after I’ve completed some new projects 😉

You may also like...

4 Responses

  1. Thanks for sharing your work. I appreciated your write-up.

  2. dan says:

    Incredibly good quality piece of article with nice graph and such a rich detail. I always wonder how this “sync” technology works. Thank you.

    I also want to learn the css styling technic that can achieve such a layout like your editor does, would you please give me some tips or resources or text book where I can learn from. Thanks!

    BTW, I(maybe?) found some typo in the article:

    “First thing first. We should start with few world [**word]? on what this project is all about.”

    “but can by also used in vide[**video]? games and other more demanding environments. “

    • Tomasz Rewak says:

      Thanks for pointing out these typos 😀 Actually in the second sentence there was even one more… (by [**be]).

      When it comes to styling it’s fairly simple. Each line is a separate div with display:flex and flex-direction:row attributes. Within those lines each fragment that needs to be displayed in a different color is wrapped in a with an appropriate class. I search for substrings that have to be styled differently using regexes and then split the text in those places (well, it’s a little bit more complicated as there is a specific order in which it has to be done that corresponds with the order of applying operators). You can take a look at the TimeWriter/time-writer-react/components/document/line.js file in the repo.

      The problem with this approach is of course that it doesn’t support text editing. So on top of that I have added an overlay that captures all of the mouse events and translates them into the caret position.

      And the rest of the layout is styled using a combination of display:flex (https://css-tricks.com/snippets/css/a-guide-to-flexbox/) and display:grid (https://css-tricks.com/snippets/css/complete-guide-grid/).

Leave a Reply to Aaron Greenlee Cancel reply

Your email address will not be published. Required fields are marked *