Friday, October 11, 2013

The Transaction Report Queue

Summary: Functional databases such as Datomic eliminate the burden of deciding what information a downstream observer needs.  Just give 'em everything, lazily.

The Transaction Report Queue

In Datomic, you can monitor all transactions.  Any peer process in the system can request a transaction report queue of every transaction against a particular database.

TxReportWatcher is a simple example of this.  It watches a particular attribute in the database, and prints the entity id and value whenever a change to that attribute appears.  The essence of the code is only a few lines of Java:

final BlockingQueue queue = conn.txReportQueue();

while (true) {
    final Map tx = queue.take();
    Collection results = q(byAttribute, tx.get(TX_DATA), attrid);
    for (Iterator it = results.iterator(); it.hasNext(); ) {
        printList(it.next());
    }
}

There are several things to note here:
  • The Datomic query function q is used to query the transaction data, showing that the full power of the database query language is available while handling transaction events.
  • The TX_DATA map key points to all the data added to the database by this particular transaction.
  • Everything is made of generic data structures accessible from any JVM language: queues, collections, maps, and lists.  (There is no ResultSet API.)

Context Matters

How much information does a transaction observer need in order to take useful action?  An easy but naive answer is "just the TX_DATA".

But when you move past toy systems and blog examples, context matters.  For example, when a user places an order in a system, you might want to take different actions based on 
  • that user's order history
  • the current inventory status
  • time limited promotions
  • that user's relation to other users
It is impossible to anticipate in advance the context you might need.  But if you don't provide enough information, you will have to go back and ask for more.  There are many risks here.  The biggest risk is that such asking will introduce tighter coupling via the need for coordination, e.g. going back and asking questions of a database, questions that must be coordinated with the time of the event.  Unnecessary coordination is a cause of complexity (and an enemy of scalability).

Is there another way?  You bet!  If you have a functional database, where points in time are immutable values, then you can make the entire database available.  Datomic provides exactly this. In addition to the TX_DATA key, the DB_AFTER key points to the entire database as of the completion of the transaction.  And the DB_BEFORE key points to the entire database immediately before the transaction started.  Because both the "before" and "after" copies of the database are immutable values, no coordination with other processes is required, ever.

A Common Misconception

Developers often raise an objection to this approach:  "Oh, I see, this approach is limited to tiny databases that can fit in memory and be passed around."  Not at all.  Because they are immutable, Datomic databases can be lazily realized, pulling into memory only the parts that are needed.

Moreover, Datomic's indexes provide leverage over your data. Queries do not have to realize the entire database, they can use just the data needed.  Datomic indexes provide leverage for "row", "column", "document", and "graph" access styles, so a wide variety of workloads are efficient.  Different peers will "cache over" their working sets automatically, without you having to plan in advance which machine needs which data.

Composition FTW

Datomic's transaction report queue makes it possible for any peer to observe and respond to transactions, with complete access to all necessary context, and without any coordination with database writes.  Transaction reports are a simple building block for scalable systems.

No comments :

Post a Comment