Monday, June 3, 2013

Sync


Background

Datomic's approach to updating peers uses a push model. Rather than have every read request route to the same server in order to get consistent data, data is stored immutably, and as soon as there is new information, all peers are notified. This completely eliminates polling any server. Thus, contrary to common presumption, when you ask the connection for the db value, there is no network communication involved: you are immediately given the local value of the db about which the connection was most recently informed.

Everyone sees a valid, consistent view. You can never see partial transactions, corruption/regression of timelines, causal anomalies etc. Datomic is always 'business rules' valid, and causally consistent.

Motivation

That does not mean that every peer sees the same thing simultaneously. Just as in the real world, it is never the case that everyone sees the same thing "at the same time" in a live distributed system.  There is no inherent shared truth, as you might convey a message to me about X at the speed of light but I can only perceive X at the speed of sound. Thus, I know X is coming, but I might have to wait for it.

This means that some peer A might commit a transaction and tell B about it before B is informed via the normal channels. This is an interesting case, as it has to do with perception and propagation delays. It is not a question of consistency, it is a question of communication synchronization.

It comes up when you would like to read-your-own-writes via other peers (e.g. when a client hits different peer servers via a load balancer), and when there is out-of-band communication of writes (A tells B about its write before the transactor does).

Tools

We've added a new sync API to help you manage these situations.

The first form of sync takes a basis point (T). It returns a future that will be fulfilled with a version of the db that includes point T. This does not cause any additional interaction with the transactor - the future will be filled by the normal communication on the update channels. But it saves you from having to poll for arrival. Most often, you will already have the requested T, and the future will complete immediately. This is the preferred method to use if you have any ability to convey the basis T, either in the message from A to B, or e.g. in cookies as a client hits different peers using a load balancer. You can easily get the basis T for any db value you have in hand.

The second form of sync takes no arguments, and works via 'ping' of the transactor. It promises not to return until all transactions that have been acknowledged by the transactor at the time sync was called have arrived at this peer. Thus if A has successfully committed a transaction and told B about it, and B then calls sync(), the database returned by sync will include A's transaction.

Conclusion

While these synchronization tools are powerful, make sure you use them only when necessary. The Datomic defaults were designed to leverage the inherent parallelism possible given immutable, accretion-only semantics and distributed storage. Notifications to peers are sent at the same time as the acknowledgement to the peer submitting the transaction, and thus are as 'simultaneous' as network communication can be. The sync tools need only be utilized to enforce cross-peer causal relationships.

No comments :

Post a Comment