Datomic Pro Starter Edition

08 October 2013

We are happy to announce today the release of Datomic Pro Starter Edition, enabling the use of Datomic for small production deployments at no cost.

Datomic Pro Starter Edition features most benefits of Datomic Pro:

Support for all storages
A perpetual license with 12 months of updates included
Support for the full Datomic programming model
Datomic Console included with download

Datomic Pro Starter Edition features community support, and does not include:

High Availability transactor support
Integrated memcached
Running more than 3 processes (2 peers + transactor)

To get started, register and download Datomic Pro Starter Edition.

Datomic Pro Starter Edition lets your team build a fully operational system and deploy to production with no additional steps or costs.

Datomic Console

29 September 2013

[Update: Watch the intro video.]

The Datomic Console is a graphical UI for exploring Datomic databases.

It supports exploring schema, building and executing queries, navigating entities, examining transaction history, and walking raw indexes. The Datomic Console is included in Datomic Pro, and is available as a separate download for Datomic Free users.

Exploring Schema

The upper left corner of the console displays a tree view of the attributes defined for the current database.

Query

The Query tab provides two synchronized views of queries: a graphical builder, and the equivalent textual representation.

You can see the results of a query in the Dataset pane on the lower right.

Entities

The Entities tab provides a tree view of an entity, plus the ability to drill in to related entities.

Transactions

The Transactions tab provides a graphical view of the history of your database at scales ranging from days down to seconds.

When you zoom in, the specific datoms in a transaction are displayed in the dataset pane.

Indexes

The Indexes tab allows you to browse ranges within a Datomic index, displaying results in the dataset pane.

And More

This post only scratches the surface, see the full docs for more details. You can save arbitrary datasets, giving them a name for reuse in subsequent queries. And, of course, you can use Datomic's time features to work with as-of, since, and historical views of your data.

The Transaction Report Queue

11 September 2013

Summary: Functional databases such as Datomic eliminate the burden of deciding what information a downstream observer needs. Just give 'em everything, lazily.

The Transaction Report Queue

In Datomic, you can monitor all transactions. Any peer process in the system can request a transaction report queue of every transaction against a particular database.

TxReportWatcher is a simple example of this. It watches a particular attribute in the database, and prints the entity id and value whenever a change to that attribute appears. The essence of the code is only a few lines of Java:

final BlockingQueue queue = conn.txReportQueue();

while (true) {
 final Map tx = queue.take();
 Collection results = q(byAttribute, tx.get(TX_DATA), attrid);
 for (Iterator it = results.iterator(); it.hasNext(); ) {
 printList(it.next());
 }
}

There are several things to note here:

The Datomic query function q is used to query the transaction data, showing that the full power of the database query language is available while handling transaction events.
The TX_DATA map key points to all the data added to the database by this particular transaction.
Everything is made of generic data structures accessible from any JVM language: queues, collections, maps, and lists. (There is no ResultSet API.)

Context Matters

How much information does a transaction observer need in order to take useful action? An easy but naive answer is "just the TX_DATA".

But when you move past toy systems and blog examples, context matters. For example, when a user places an order in a system, you might want to take different actions based on

that user's order history
the current inventory status
time limited promotions
that user's relation to other users

It is impossible to anticipate in advance the context you might need. But if you don't provide enough information, you will have to go back and ask for more. There are many risks here. The biggest risk is that such asking will introduce tighter coupling via the need for coordination, e.g. going back and asking questions of a database, questions that must be coordinated with the time of the event. Unnecessary coordination is a cause of complexity (and an enemy of scalability).

Is there another way? You bet! If you have a functional database, where points in time are immutable values, then you can make the entire database available. Datomic provides exactly this. In addition to the TX_DATA key, the DB_AFTER key points to the entire database as of the completion of the transaction. And the DB_BEFORE key points to the entire database immediately before the transaction started. Because both the "before" and "after" copies of the database are immutable values, no coordination with other processes is required, ever.

A Common Misconception

Developers often raise an objection to this approach: "Oh, I see, this approach is limited to tiny databases that can fit in memory and be passed around." Not at all. Because they are immutable, Datomic databases can be lazily realized, pulling into memory only the parts that are needed.

Moreover, Datomic's indexes provide leverage over your data. Queries do not have to realize the entire database, they can use just the data needed. Datomic indexes provide leverage for "row", "column", "document", and "graph" access styles, so a wide variety of workloads are efficient. Different peers will "cache over" their working sets automatically, without you having to plan in advance which machine needs which data.

Composition FTW

Datomic's transaction report queue makes it possible for any peer to observe and respond to transactions, with complete access to all necessary context, and without any coordination with database writes. Transaction reports are a simple building block for scalable systems.

Datomic MusicBrainz sample database

01 June 2013

MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public. We are pleased to release a sample project that uses the MusicBrainz dataset to help people get familiar with using Datomic.
The MusicBrainz dataset makes a great example database for learning, evaluating, or testing Datomic for a couple of reasons:

It deals with a domain with which nearly everyone is familiar
It is of decent size: 60,438 labels; 664,226 artists; 1,035,592 album releases; and 13,233,625 recorded tracks
It comprises a good number of entities, attributes, and relationships
It is fun to play with, query, and explore

Schema

The mbrainz-sample schema is an adaptation of a subset of the full MusicBrainz schema. We didn't include some entities, and we made some simplifying assumptions and combined some entities. In particular:

We omit any notion of Work
We combine Track, Tracklist and Recording into simply "track"
We renamed Release group to "abstractRelease"

Abstract Release vs. Release vs. Medium

(Adapted from the MusicBrainz schema docs)
An "abstractRelease" is an abstract "album" entity (e.g. "The Wall" by Pink Floyd). A "release" is something you can buy in your music store (e.g. the 1984 US vinyl release of "The Wall" by Columbia, as opposed to the 2000 US CD release by Capitol Records).
Therefore, when you query for releases e.g. by name, you may see duplicate releases. To find just the "work of art" level album entity, query for abstractRelease.
The media are the physical components comprising a release (disks, CDs, tapes, cartridges, piano rolls). One medium will have several tracks, and the total tracks across all media represent the track list of the release.

Relationship Diagram

Entities

For information about the individual entities and their attributes, please see the schema page in the wiki, or the EDN schema itself.

Getting Started

First get Datomic, and start up a transactor.

Getting the Data

Next download the mbrainz backup:


 # 2.8 GB, md5 4e7d254c77600e68e9dc71b1a2785c53
 wget http://s3.amazonaws.com/mbrainz/datomic-mbrainz-backup-20130611.tar

and extract:

 # this takes a while
 tar -xvf datomic-mbrainz-backup-20130611.tar

Finally, restore the backup:

 # takes a while, but prints progress -- ~150,000 segments in restore
 bin/datomic restore-db file:datomic-mbrainz-backup-20130611 datomic:free://localhost:4334/mbrainz

Getting the Code

Clone the git repo somewhere convenient:

 git clone git@github.com:Datomic/mbrainz-sample.git
 cd mbrainz-sample

Running the examples

From Java

Fire up your favorite IDE, and configure it to use both the included pom.xml and the following Java options when running:


 -Xmx2g -server

From Clojure

Start up a Clojure REPL:

 # from the root of the mbrainz-sample repo
 lein repl

Then connect to the database and run the queries.

Thanks

We would like to thank the MusicBrainz project for defining and compiling a great dataset, and for making it freely available. ..

Component Entities

19 May 2013

This post demonstrates Datomic's component entities, and highlights a new way to create components available in today's release. You can follow along in the code via the sample project.

The code examples use Groovy, a JVM language that combines similarity to Java with concision. If you are a Java developer new to Groovy, you may want to read this first.

Why Components?

In a database, some entities are their own identities, and others exist only as part of a larger parent entity. In Datomic, the latter entities are called components, and are reached from the parent via an attribute whose schema includes :db/isComponent true.

As a familiar example, consider orders, line items, and products. Orders have references to line items, and those references are through a component attribute, since line items have no independent existence outside of an order. Line items, in turn, have references to products. References to products are not component references, because products exist regardless of whether or not they are part of any particular order.

The schema for a line item component reference looks like this:

{:db/id #db/id[:db.part/db]
 :db/ident :order/lineItems
 :db/isComponent true
 :db/valueType :db.type/ref
 :db/cardinality :db.cardinality/many
 :db.install/_attribute :db.part/db}

Notice also that line items are :db.cardinality/many, since a single order can have many of them.

Component attributes gain three special abilities in Datomic:

you can create components via nested maps in a transaction (new in 0.8.4020)
touching an entity recursively touches all its components
:db.fn/retractEntity recursively deletes all its components

Each of these abilities is demonstrated below.

Creating Components

To demonstrate line item components, let's create an order for some chocolate and whisky. First, here is a query for products ?e matching a particular description ?v:

productQuery = '''[:find ?e
 :in $ ?v
 :where [?e :product/description ?v]]''';

Now, we can query for the products we want to order:

(chocolate, whisky) = ['Expensive Chocolate', 'Cheap Whisky'].collect {
 q(productQuery, conn.db(), it)[0][0];
}
===> [17592186045454, 17592186045455]

The statement above uses Groovy's multiple assignment to assign chocolate to the first query result, and whisky to the second.

Now that we have some products, we can create an order with some line items. As of today's release, you can do this via nested maps:

order = [['order/lineItems': [['lineItem/product': chocolate,
 'lineItem/quantity': 1,
 'lineItem/price': 48.00],
 ['lineItem/product': whisky,
 'lineItem/quantity': 2,
 'lineItem/price': 38.00]],
 'db/id': tempid()]];

The nested maps above expand into two subentities. Notice that you do not need to create a tempid for the nested line items -- they will be auto-assigned tempids in the same partition as the parent order.

The order above is pure data (a list of maps). This greatly facilitates development, testing, and composition. When we are ready to put the data in the database, the transaction is as simple as:

conn.transact(order).get();

Touching Components

Now we can query to find the order we just created. To demonstrate that query can reach anywhere within your data, we will do a multiway join to find the order via product description:

ordersByProductQuery = '''
[:find ?e
 :in $ ?productDesc
 :where [?e :order/lineItems ?item]
 [?item :lineItem/product ?prod]
 [?prod :product/description ?productDesc]]''';

The query above joins

from the provided productDesc input to to the product entity ?prod
from ?prod to the order item ?item
from ?item to the order ?e

and returns ?e.

We are going to immediately pass ?e to datomic's entity API, so let's take a moment to create a Groovy closure qe that automates query + get entity:

qe = { query, db, Object[] more ->
 db.entity(q(query, db, *more)[0][0])
}

Now we can find an order the includes chocolate:

order = qe(ordersByProductQuery, db, 'Expensive Chocolate');

Because the Datomic database is an immutable value in your own address space, entities can be lazily realized. When you first look at the order, you won't see any attributes at all:

===> {:db/id 17592186045457}

The touch API will realize all the immediate attributes of the order, plus it will recursively realize any components:

order.touch();
===> {:order/lineItems #{{:lineItem/product #, 
 :lineItem/price 38.00M, 
 :lineItem/quantity 2, 
 :db/id 17592186045459} 
 {:lineItem/product #, 
 :lineItem/price 48.00M, 
 :lineItem/quantity 1, 
 :db/id 17592186045458}}, 
 :db/id 17592186045457}

Notice that the line items are immediately realized, and you can see all their attributes. However, the products are not immediately realized, since they are not components. You can, of course, touch them yourself if you want.

Retracting Components

I am not as hungry or thirsty as I thought. Let's retract that order, using Datomic's :db.fn/retractEntity:

conn.transact([[":db.fn/retractEntity", order[":db/id"]]]).get();

Retracting an entity will retract all its subcomponents, in this case the line items. To see that the line items are gone, we can count all the line items in our database:

q('''[:find (count ?e)
 :where [?e :order/lineItems]]''',
 db);
===> []

References to non-components will not be retracted. The products are all still there:

q('''[:find (count ?e)
 :where [?e :product/description]]''',
 db);
===> [[2]]

Conclusion

Components allow you to create substantial trees of data with nested maps, and then treat the entire tree as a single unit for lifecycle management (particularly retraction). All nested items remain visible as first-class targets for query, so the shape of your data at transaction time does not dictate the shape of your queries. This is a key value proposition of Datomic when compared to row, column, or document stores. ..

PREVIOUS 10 of 13 NEXT

Datomic Pro Starter Edition

Datomic Console

Exploring Schema

Query

Entities

Transactions

Indexes

And More

The Transaction Report Queue

The Transaction Report Queue

Context Matters

A Common Misconception

Composition FTW

Datomic MusicBrainz sample database

Schema

Abstract Release vs. Release vs. Medium

Relationship Diagram

Entities

Getting Started

Getting the Data

Getting the Code

Running the examples

From Java

From Clojure

Thanks

Component Entities

Why Components?

Creating Components

Touching Components

Retracting Components

Conclusion

Datomic Pro

Datomic Cloud

Resources

Company