Friday, August 10, 2012

Keep Chocolate Love Atomic

Datomic is a database of atomic facts, or datoms, that consist of entity, attribute, value, and transaction. For example, "I love chocolate (as of tx 1000)."

Of couse, I am capable of loving many things, so the :loves attribute should be :cardinality/many. Here is an abbreviated history of my loves:

; some point in time...
[[:db/add stu :loves :chocolate]
 [:db/add stu :loves :vanilla]]

; later...
[[:db/add stu :loves :octomore]
 [:db/retract stu :loves :vanilla]]

The set of all things I currently love is derived information, and it can be calculated from the history of atomic facts. Based on the transactions above, I currently love :chocolate and :octomore.

Datomic automatically handles this derivation, as can be seen through the entity interface:

stu.get(:loves) 
=> #{:chocolate :octomore}

Now, imagine creating a web interface with checkboxes for different things a person might love.  You initially populate the interface with my current loves, pulled from the database. I interact with the system, and you get back a set of checkbox states.

At this point, you should submit adds and retracts only for the new facts I created -- not a set with an add or retract for every UI element. This is a subtle point. If I liked chocolate before, and I didn't uncheck chocolate, what is the harm in saying "Stu likes chocolate" again?

The biggest problem is that you are lying to the database. I didn't repeat my love of chocolate. What if the system also had a user interface more subtle than checkboxes, that allowed me to reiterate past preferences? You wouldn't be able to tell the difference.

An obvious warning sign is when you find yourself submitting derived information (the set of my likes) when you actually have the facts (what I just said) in hand. Ignoring facts and recording derived information is always perilous -- imagine managing a system that records birthdays and ages, but not birthdates.

A more subtle mistake is to abuse transactions to extract facts from derived information. You have a new derived set in hand, and the database knows how to calculate the previous derived set. Given those two things, you could write a transaction function that takes the two sets and backtracks to figure out what changed.

This approach has a variant of the dishonesty problem mentioned before, in that it provides no way for me to reiterate my love for chocolate. But the other problem with this approach may be even worse: It imposes coordination in the implementation, where no coordination was required by the domain.

Let's say that I choose, at some point in time, to start liking :cheesecake and :nachos. These are atomic choices, requiring no coordination with any historical record. If you send Datomic a set of all checkbox states, and ask it to discover :cheesecake and :nachos inside a transaction, you are manufacturing a coordination job that has no basis in reality. Unnecessary coordination is an enemy of scalability and reuse.

The root cause of confusion here is update-in-place thinking. The checkbox model exposes derived
information (the current states) but not the facts (the choices the user made). Given the set of checkbox states, you should do the diff in the web tier as soon as you pull data out of the form. This still has the problem that there is no way to restate that you love chocolate, but now the scope of the problem is localized to its cause -- the checkbox model. You can fix the problem, or not (you often don't care, which is why checkboxes work the way they do). But at least you are not propagating the problem into the permanent record.

Datomic is built on an understanding that data is created by atomic addition, not by corruptive modification. When your input source has an update-in-place model (such as checkbox states), you should convert to atomic facts before creating a transaction.

Now go eat some chocolate.