Friday, May 10, 2013

Excision

Motivation

It is a key value proposition of Datomic that you can tell not only what you know, but how you came to know it.  When you add a fact:

conn.transact(list(":db/add", 42, ":firstName", "John"));

Datomic does more than merely record that 42's first name is "John".  Each datom is also associated with a transaction entity, which records the moment (:db/txInstant) the datom was recorded.


Given these reified transactions, it is possible to track the history of information.  Let's say John decides he prefers to go by "Jack":

conn.transact(list(":db/add", 42, ":firstName", "Jack"));

When you assert a new value for a cardinality-one attribute such as :firstName, Datomic will automatically retract any past value (cardinality-one means that you cannot have two first names simultaneously).  So now the database looks like this:

Given this information model, it is easy to see that Datomic can support queries that tell you:
  • what you know now
  • what you knew at some point in the past
  • how and when you came to know any particular datom
So far so good, but there is a fly in the ointment.  In certain situations you may be forced to excise data, pulling it out root and branch and forgetting that you ever knew it.  This may happen if you store data that must comply with privacy or IP laws, or you may have a regulatory requirement to keep records for seven years and then "shred" them.  For these scenarios, Datomic provides excision.

Excision in Datomic

You can request excision of data by transacting a new entity with the following attributes:
  • :db/excise is required, and refers to a target entity or attribute to be excised. Thus there are two scenarios - excise all or part of an entity, or excise some or all of the values of a particular attribute.
  • :db.excise/attrs is an optional, cardinality-many, reference attribute that limits an excision to a set of attributes, useful only when the target of excise is an entity. (If :db.excise/attrs are not specified, then all matching attributes will be excised.)
  • :db.excise/beforeT is an optional, long-valued attribute that limits an excision to only datoms whose t is before the specified beforeT, which may be a t or tx id. This can be used with entity or attribute targets.
  • :db.excise/before is an optional, instant-valued attribute that limits an excision to only datoms whose transaction time is before the specified before. This can be used with entity or attribute targets.

Example: Excising Specific Entities

To excise a specific entity, manufacture a new entity with a :db/excise attribute pointing to that entity's id.  For example, if user 42 requests that his personal data be removed from the system, the transaction data would be:

[{:db/id #db/id[db.part/user],
  :db/excise 42}]

Since :db.excise/attrs is not specified in the transaction data above, all datoms about entity 42 will be excised.

Example: Excising a Window in Time

To excise old values of a particular attribute, you can create an excision for the attribute you want to eliminate, and then limit the excision using either before or beforeT.  Imagine tracking application events that have users, categories, and details.  Your application produces a ton of events, but you don't care about the old ones.  Here is a transaction that will excise all the pre-2012 events:

  [{:db/id #db/id[db.part/user],
    :db/excise :event/user
    :db.excise/before #inst "2012"}
   {:db/id #db/id[db.part/user],
    :db/excise :event/category
    :db.excise/before #inst "2012"}
   {:db/id #db/id[db.part/user],
    :db/excise :event/description
    :db.excise/before #inst "2012"}]

Remembering That You Forgot

It is a key value proposition of Datomic that you can tell not only what you know, but how you came to know it.  This seems to be at odds with excision: if you remember what you forgot, then you didn't really forget it!

You cannot remember what you forgot, but you can remember that you forgot.  Excise attributes are ordinary attributes in the database, and you can query them.  The following query would tell you if datoms about entity 42 have ever been excised:

[:find ?e :where [?e :db/excise 42]]

Once you find those entities, you can of course use the entity API to navigate to the specific attribute and before filters of the excisions.

Excise attributes are protected from excision, so you cannot erase your tracks.  (Other important datoms such as schema are also protected, see the documentation for full details.)

Handle With Care

Excision is different from any other operation in Datomic.  While excision requests are transactions, excision itself is not transactional.  Excision will happen on the next indexing job.

Excision is permanent and unrecoverable.  Take a backup before performing significant excisions, and use excision only when your domain requires that you deliberately forget certain data.  

2 comments :

  1. What happens to references, in other entities, to an excised entity? Are there any error cases that crop up due to excision that won't occur normally?

    ReplyDelete
  2. When excising an entire entity, all component entities are also excised, as are all inbound references to the excised entity. When selecting particular attributes of an excised entity, both in- and outbound values of that attribute involving the excised entity are excised, and, if a component attribute, the component entity is excised in its entirety.

    When using the excise values of attribute form, managing all inbound references and components is on you, currently.

    ReplyDelete