Wednesday, June 12, 2013

Using Datomic from Groovy, Part 1: Everything is Data

In this post, I will demonstrate transacting and querying against Datomic from Groovy.  The examples shown here are based on the following schema, for a simple social news application:


There are a number of more in-depth samples in the datomic-groovy-examples project on Github.

Why Groovy

Groovy offers four key advantages for a Java programmer using Datomic:
  • Groovy provides interactive development through groovysh, the Groovy shell.  When combined with Datomic's dynamic, data-driven style, this makes it easy to interactively develop code in real time.  The source code for this post has a number of other examples designed for interactive study within the Groovy shell.
  • Groovy's collection literals make it easy to see your data.  Lists and maps are as easy as:
aList = ['John', 'Doe'];
aMap = [firstName: 'John', 
        lastName: 'Doe'];
  • Groovy's closures make it easy to write functions, without the noise of single-method interfaces and anonymous inner classes. For instance, you could grab all the lastNames from a collection of maps
lastNames = people.collect { it['lastName'] }

  • Of the popular expressive languages that target the JVM, Groovy's syntax is most similar to Java's.

Transactions

A Datomic transaction takes a list of data to be added to the database, and returns a future map describing the results. The simplest possible transaction is a list of one sublist that adds an atomic fact, or datom, to the database, using the following shape:

conn.transact([[op, entityId, attributeName, value]]);

The components above are:
  • op is a keyword naming the operation. :db/add adds a datom, and :db/retract retracts a datom.
  • entityId is the numeric id of an entity.  You can use tempid when creating a new entity.
  • attributeName is a keyword naming an attribute.
  • value is the value of an attribute.  The allowed types for an attribute include numerics, strings, dates, URIs, UUIDs, binaries, and references to other entities.
Keywords are names, prefixed with a colon, possibly with a leading namespace prefix separated by the slash char, e.g.

:hereIsAnUnqualifiedName
:the.namespace.comes.before/theName

Putting this all together, you might add a new user's first name with:

conn.transact([[':db/add', newUserId, ':user/firstName', 'John']]);

If you are adding multiple datoms about the same entity, you can use a map instead of a list, with the special keyword :db/id identifying the entity.  For example, the following two transactions are equivalent:

// create an entity with two attributes (map form)
conn.transact([[':db/id': newUserId,
                ':user/firstName': 'John',
                ':user/lastName': 'Doe']]);

// create an entity with two attributes (list form)
conn.transact([[':db/add' newUserId, ':user/firstName', 'John'],
               [':db/add' newUserId, ':user/lastName', 'Doe']]);

Let's look next at composing larger transactions out of smaller building blocks. You have already seen creating a user:

newUser = [[':db/id': newUserId,
            ':user/email': 'john@example.com',
            ':user/firstName': 'John',
            ':user/lastName': 'Doe']];

Notice that this time we did not call transact yet, instead we just stored data describing the user into newUser.   

Now imagine that you have a collection of story ids in hand, and you want to create a new user who upvotes those stories.   Groovy's collect method iterates over a collection, transforming values using a closure with a default single parameter named it. We can use collect to build new assertions that refer to each story in a collection of storyIds:

upvoteStories = storyIds.collect {
  [':db/add', newUserId, ':user/upVotes', it]
}

Now we are ready to build a bigger transaction out of the pieces.  Because transactions are made of data, we don't need a special API for this.  Groovy already has an API for concatenating lists, called +:

conn.transact(upvoteAllStories + newUser);

Building Datomic transactions from data has many advantages over imperative or object-oriented approaches:
  • Composition is automatic, and requires no special API.
  • ACID transactionality is scoped to transaction calls, and does not require careful management across separate calls to the database.
  • Because they are data, Datomic transactions are flexible across system topology changes: they can be built offline for later use, serialized, and/or enqueued.

Query

The Datomic query API is named q, and it takes a query plus one or more inputs. The simple query below takes a query plus a single input, the database db, and returns the id of every entity in the system with an email address:

q('''[:find ?e 
      :where [?e :user/email]]''', db);

Keyword constants in the :where clause constrain the results.  Here, :user/email constrains results to only those entities possessing an email.  

Symbols preceded by a question mark are variables, and will be populated by the query engine.  The variable ?e will match every entity id associated with an email.

A query always returns a set of lists, and the :find clause specifies the shape of lists to return.  In this example, the lists are of size one since a single variable ?e is specified by :find.

Note that the query argument to q is notionally a list.  As a convenience, you can pass the query argument as either a list or (as shown here) as an edn string literal.

The next query further constrains the result, to find a specific email address:

q('''[:find ?e
      :in $ ?email
      :where [?e :user/email ?email]]''',
  db, 'editor@example.com');

There are several things to see here.  There are now two inputs to the query: the database itself, and the specific email "editor@example.com" we are looking for.  Since there is more than one input, the inputs must be named by an :in clause.  The :in clause names inputs in the order they appear:
  1. $ is Datomic shorthand for a single database input. 
  2. ?email is bound to the scalar "editor@example.com".
Inputs need not be scalar. The shape [?varname ...] in an :in clause is called a collection binding form, and it binds a collection instead of a single value. The following query looks up two different users by email:

q('''[:find ?e
      :in $ [?email ...]
      :where [?e :user/email ?email]]''',
  db, ['editor@example.com', 'stuarthalloway@datomic.com']);

Another way to join is by having more than one constraint in a :where clause.  Whenever a variable appears more than once, it must match the same set of values in all the locations that it appears.  The following query joins through ?user to find all the comments for a user:

q('''[:find ?comment
      :in $ ?email
      :where [?user :user/email ?email]
             [?comment :comment/author ?user]]''',
  db, 'editor@example.com')

We have only scratched the surface here.  Datomic's query also supports rules, predicates, function calls, cross-database queries (with joins!), aggregates, and even queries against ordinary Java collections without a database.  In fact, the Datalog query language used in Datomic supports a superset of the capabilities of the relational algebra that underpins SQL.

Conclusion

In this installment, you have seen the powerful, compositional nature of programming with generic data.  In Part 2, we will look at the database as a value, and explore the implications of having a lazy, immutable database inside your application process.

1 comment :