Wednesday, January 24, 2018

High Availability in Datomic Cloud

Based on my rigorous polling, these are the top five questions people have about HA in Datomic Cloud:
  • How does it work?
  • What do I have to do?
  • Hey wait a minute! What about serializability?
  • Is AWS cool?
  • So what else have you got?

How Does It Work?

In the Production Topology, Datomic cluster nodes sit behind an Application Load Balancer (ALB). As any node can handle any request, there is no single point of failure. The starting cluster size of two ensures that nodes remain available in the face of a single node failure. By increasing the cluster size beyond two, you both enhance availability and increase the number of queries the system can handle. An AutoScaling Group (ASG) monitors nodes and automatically replaces nodes that fail to health check.

What Do I Have To Do?

Nothing. HA is automatic.

Hey Wait a Minute! What About Serializability?

Datomic is always transactional, fully serialized, and consistent in both the ACID and CAP senses. Don't waste your life writing code to compensate for partial failures and subtle concurrency bugs when you could be making your application better and shipping it faster.                                             
So how does that square with shared-nothing cluster nodes? The answer is simple: The nodes use DynamoDB to serialize all writes per database.

At any point in time a database has a preferred node for transactions. In normal operation all transactions for a database will flow to/through that node. If for any reason (e.g. a temporary network partition) the preferred node can't be reached, any node can and will handle transactions. Consistency is ensured by conditional writes to DynamoDB. If a node becomes unreachable, Datomic will choose a new preferred node.
Note in particular that:
  1. This is not a master/follower system like Datomic On-Prem and many other databases – nobody is tracking mastership and there are no failover intervals.
  2. This should not be confused with parallel multi-writer systems such as Cassandra. Write availability is governed by the availability of DynamoDB conditional writes and strongly-consistent reads.

Is AWS Cool?

Very cool. Datomic Cloud showcases the benefit of designing for AWS vs. porting to AWS, and there is a lot going on behind zeroconf HA:

So What Else Have You Got?

Not all systems need HA. You can prototype a Datomic Cloud system with the (non-HA) Solo Topology for about $1/day.  The topology differences are entirely abstracted away from clients and applications, so you can easily upgrade to the Production Topology later.

For More Information:

Check out the docs for
Or just jump right in.