Tuesday, December 5, 2017

Datomic Pull :as

Datomic's Pull API provides a declarative way to make hierarchical and nested selections of information about entities.  The 0.9.5656 release enhances the Pull API with a new :as clause that provides control over the returned keys.

As an example, imagine that you want information about Led Zeppelin's tracks from the mbrainz dataset. The following pull pattern navigates to the artist's tracks, using limit to return a single track:

;; pull expression
'[[:track/_artists :limit 1]]

=> #:track{:_artists
           [#:db{:id 17592188757937}]}

The entity id 17592188757937 is not terribly interesting, so you can use a nested pull pattern to request the track name instead:

;; pull pattern
'[{[:track/_artists :limit 1] [:track/name]}]

=> #:track{:_artists [#:track{:name "Black Dog"}]}

That is better, but what if you want different key names? This can happen for reasons including:

  • you are targeting an environment that does not support symbolic names, so you need a string instead of a keyword key
  • you do not want to expose the direction of navigation (e.g. the underscore in :track/_artists)
  • your consumers are expecting a different name
The :as option lets you rename result keys to arbitrary values that you provide, and works at any level of nesting in a pull pattern. The pattern below uses :as twice to rename the two keys in the result:

;; pull expression
'[{[:track/_artists :limit 1 :as "Tracks"]
   [[:track/name :as "Name"]]}]

=> {"Tracks" [{"Name" "Black Dog"}]}


To try it out you can grab the latest release, review the Pull grammar, and work through these examples at the REPL.



Thursday, March 23, 2017

New Datomic Training Videos and Getting Started Documentation

We are excited to announce the release of a new set of Day of Datomic training videos!
Filmed at Clojure/Conj in Austin, TX in December of 2016, this series covers everything from the architecture and data model of Datomic to operation and scaling considerations.

The new training sessions provide a great foundation for developing a Datomic-based system. For those of you who have watched the original Day of Datomic videos, the series released today uses the new Datomic Client library for the examples and workshops, so if you haven't yet explored Datomic Clients, now is the perfect opportunity to do so!

If you ever want to refer back to the original Peer-based training videos, don't worry - they're all still available as well.

In addition to an updated Day of Datomic, we've released a fully re-organized and re-written Getting Started section in the Datomic Documentation. We have gathered and incorporated feedback from new and existing users and hope that the new Getting Started is a much more comprehensive and accessible introduction to Datomic.

We look forward to your thoughts and feedback. If you have any comments on the new training videos, the new getting started section, or any additional thoughts, please let us know!

Wednesday, January 25, 2017

The Ten Rules of Schema Growth

Data outlives code, and a valuable database supports many applications over time. These ten rules will help grow your database schema without breaking your applications.

1.  Prod is not like dev.

Production is not development. In production, one or more codebases depend on your data, and these ten rules below should be followed exactingly.

A dev environment can be much more relaxed.  Alone on your development machine experimenting with a new feature, you have no users to break.  You can soften the rules, so long as you harden them when transitioning to production.

2.  Grow your schema, and never break it.

The lack of common vocabulary makes it all too easy to automate the wrong practices. I will use the terms growth and breakage as defined in Rich Hickey's Spec-ulation talk.  In schema terms:

  • growth is providing more schema
  • breakage is removing schema, or changing the meaning of existing schema.

In contrast to these terms, many people use "migrations", "refactoring", or "evolution". These usages tend to focus on repeatability, convenience, and the needs of new programs, ignoring the distinction between growth and breakage. The problem here is obvious: Breakage is bad, so we don't want it to be more convenient!

Using precise language underscores the costs of of breakage. Most migrations are easily categorized as growth or breakage by considering the rules below.  Growth migrations are suitable for production, and breakage migrations are, at best, a dev-only convenience. Keep them widely separate.

3. The database is the source of truth.

Schema growth needs to be reproducible from one environment to another.  Reproducibility supports the development and testing of new schema before putting it into production and also the reuse of schema in different databases. Schema growth also needs to be evident in the database itself, so that you can determine what the database has, what it needs, and when growth occurred.

For both of these reasons, the database is the proper source of truth for schema growth. When the database is the source of truth, reproducability and auditability happen for free via the ordinary
query and transaction capabilities of the database.  (If your database is not up to the tasks of queries and transactions you have bigger problems beyond the scope of this article).

Storing schema in a database is strictly more powerful than storing schema as text files in source control. The database is the actual home for schema, plus it provides validation, structure, query, transactions, and history. A source control system provides only history and is separate from the data itself.

Note that this does not mean "never put schema information in source control". Source control may be convenient for other reasons, e.g. it may be more readily accessible. You may redundantly store schema in source control, but remember that the database is definitive.

4.  Growing is adding.

As you acquire more information about your domain, grow your schema to match. You can grow a schema by adding new things, and only by adding new things, for example:

  • adding new attributes to an existing 'type'
  • adding new types
  • adding relationships between types

5.  Never remove a name.

Removing a named schema component at any level is a breaking change for programs that depend on that name. Never remove a name.

6.  Never reuse a name.

The meaning of a name is established when the name is first introduced. Reusing that name to mean something substantially different breaks programs that depend on that meaning. This can be even
worse than removing the name, as the breakage may not be as immediately obvious.

7.  Use aliases.

If you are familiar with database refactoring patterns, the advice in Rules Five and Six may seem stark. After all, one purpose of refactoring is to adopt better names as we discover them. How can we
do that if names can never be removed or changed in meaning?

The simple solution is to use more than one alias to refer to the same schema entity. Consider the following example:

  • In iteration 1, users of your system are identified by their email with an attribute named :user/id
  • In iteration 2, you discover that users sometimes have non-email identifiers for users and that you want to store a user's email even when not using the email as an identifier. In short, you wish that :user/id was named :user/primary-email.

No problem! Just create :user/primary-email as an alias for :user/id. Older programs can continue to use :user/id, and newer programs can use the now-preferred :user/primary-email.

8.  Namespace all names.

Namespaces greatly reduce the cost of getting a name wrong, as the same local name can safely have different meanings in different namespaces.  Continuing the previous example, imagine that the local
name id is used to refer to a UUID in several namespaces, e.g. :inventory/id, :order/id, and so on. The fact that :user/id is not a UUID is inconsistent, and newer programs should not have to put up with this.

Namespaces let you improve the situation without breaking existing programs. You can introduce :user-v2/id, and new programs can ignore names in the user namespace. If you don't like v2, you can also pick a more semantic name for the new namespace.

9.  Annotate your schema.

Databases are good at storing data about your schema. Adding annotations to your schema can help both human readers and make sense of how the schema grew over time. For example:

  • you could annotate names that are not recommended for new programs with a :schema/deprecated flag, or you could get fancier still with :schema/deprecated-at or :schema/deprecated-because. Note that such deprecated names are still never removed (Rule Five).
  • you could provide :schema/see-also or :schema/see-instead pointers to more current conventions. 

In fact, all the database refactoring patterns that are typically implemented as breaking changes could be implemented non-destructively, with the refactoring details recorded as an annotation. For example, the breaking "split column" refactoring might instead be implemented as schema growth:

  • add N new columns
  • (optional) add a :schema/split-into attribute on the original column whose value is the new columns, and possibly even the recipe for the split

10. Plan for accretion.

If a system is going to grow at all, then programs must not bake in limiting presumptions.  For example: If a schema states that :user/id is a string, then programs can rely on :user/id being a string and not occasionally an integer or a boolean.  But a program cannot assume that a user entity will be limited to a the set of attributes previously seen, or that it understands the semantics of attributes that it has not seen before.

Are these rules specific to a particular database?

No. These rules apply to almost any SQL or NoSQL database.  The rules even apply to the so-called "schemaless" databases.  A better word for schemaless is "schema-implicit", i.e. the schema is implicit in your data and the database has no reified awareness of it.  With an implicit schema, all the rules still apply, except that the database is impotent to help you (no Rule 3).

In Context

Many of the resources on migrations, refactoring, and database evolution emphasize repeatability and the needs of new programs, without making the top-level distinctions of growth vs. breakage and prod vs. dev. As a result, these resources encourage breaking the rules in this article.

Happily, these resources can easily be recast in growth-only terms.  You can grow your schema without breaking your app. You can continuously deploy without continuously propagating breakage.  Here's what it looks like in Datomic.


Wednesday, December 14, 2016

Customer Feedback Portal

As part of our commitment to improving Datomic, a few weeks ago we enabled a new feature request and user feedback system, Receptive.io, where you can help us prioritize our efforts and help shape the future of Datomic.

To submit your feature request follow the "Suggest Features" link in the top navigation of the my.datomic.com dashboard. We have already connected your account to Receptive.io so everything is set up and ready for you to go.

You can read more about using Receptive here.

-The Datomic Team

Monday, November 28, 2016

Datomic Update: Client API, Unlimited Peers, Enterprise Edition, and More

We are pleased to announce that the latest (0.9.5530) release of Datomic includes a set of new features and licensing changes to address needs identified by our customers:
  • In addition to the peer model, Datomic now includes a Client API suitable for smaller, short lived processes, e.g. microservices.
  • The various tiers of the Datomic Pro license model have been simplified to a single license with no restriction on peer count.
  • We have introduced an Enterprise license tier for users who need customized pricing, support, or licensing terms.
  • Tempids and explicit partitions are now optional, simplifying code for the many programs that do not care about them.
  • Schema install and update are now implicit, and do not require explicit :db.install/attribute or :db.update/attribute datoms.

The features described above are additive and opt-in, so take advantage of them as and when you please.

Each of these changes is described in more detail below.

Building On a Solid Foundation


Before talking about what is new, it is important to talk about what is unchanged. We built Datomic believing that the Rationale is a sound foundation for an information system, and experience has proven this out. We have not retracted a word of the rationale since day one, and are not doing so today. Datomic’s core ideas are unchanged:
  • getting time, process, and perception right
  • sound data model
  • ACID transactions
  • Datalog query
  • minimal schema
  • separate reads and writes
  • programming with data

Datomic has delivered these ideas with a discipline that minimizes breaking API change. As a result, Datomic users have been able to focus on their business problems without having to worry about changing semantics in their database.

Client API


Datomic’s peer library puts database query in your own application process. This provides several benefits, but at the price of a heavier dependency (both in code and in memory requirements) than a traditional client.

A smaller footprint is useful in environments that have operational limitations, or where processes are small or short-lived. The new Datomic client API addresses this need. Lightweight clients connect to Peer Servers, which are peers that run in a separate address space.

Existing peers are unchanged, and you can mix and match peer and client applications as you see fit within the same Datomic install. Clients and peers are described in detail in the new clients and peers section of the docs.

With today’s release, we are making available the alpha version of the open source Client library for Clojure. The Java library will be released shortly. We also have plans to both create more language libraries for Client and enable our customers to create their own. We are interested in your feedback on the Client API itself and the priority of our language reach efforts. As of today, we have enabled a customer feedback portal, accessible via the "Suggest Features" link in the top navigation of the my.datomic.com dashboard, where you can help us prioritize our efforts in this (and many other) areas.

Unlimited Peers


Flexibility in Peer use has been the most often-requested update to Datomic. You are solving complex problems using cutting edge technologies and architectures. Your tools should allow you to design the system that best fits your needs.  Datomic’s new licensing model gives all users - Starter, Pro and Enterprise - the ability to design for and deploy as many Peer processes (and Clients!) as their systems require. Today’s release represents a massive upgrade to the potential of each (new and existing) Datomic installation.

Pro Starter License


The Pro Starter license provides a no-cost way to try Datomic. You get a perpetual license plus a year of software upgrades for free. Starting with this release, Pro Starter includes all the features of a Pro license, including
  • unlimited peers
  • clients
  • High Availability (HA)
  • integrated memcached

Enterprise Tier


Datomic has a number of enterprise customers already. They distinguish themselves by wanting
  • custom license terms
  • custom pricing for larger installations
  • custom support terms
  • custom development

If you match one or more of these criteria, contact us to discuss an Enterprise license.

Tempid and Partition Defaults


Datomic’s tempids provide a way to partition new entities, encoding a locality hint directly in transactions. This feature is powerful, but rarely used, and the API and data structure for tempids are an inconvenience for the majority of users, who do not need or want partition control.

Starting with the current release of Datomic:
  • tempids are optional
  • when you need a tempid to coordinate the relationship between two entities, you can use an ordinary string instead of a tempid structure, and that string can be meaningful to readers of your code
  • the existing tempid data structure and API continue to be supported unchanged. Use them if you want them.

Clients will support string tempids only.

Schema Install and Update


Transactions that change attribute schema must include either :db.install/attribute (to create an attribute) or:db.alter/attribute (to change an existing attribute). The new release of Datomic infers the need for these datoms and adds them to your transaction automatically, reducing the verbosity of schema data.

Conclusion


We are very excited about the additions and changes to Datomic. To celebrate, we will be offering a 20% discount on new Datomic purchases through the end of February 2017. We hope you take advantage of the new features and this discount opportunity and please feel free to reach out to us at anytime.

Wednesday, August 10, 2016

Log API for Memory Databases

The most recent Datomic Release provides access to the Datomic Log API for memory databases. I would like to take this opportunity to describe some of the features and uses of the Datomic Log API.

The transactional log is a fundamental component of ACID database systems, a durable record of the transactions performed by the database. In addition to its critical function in ensuring ACID semantics, the Datomic log, as a sequential (in database time, t) record of all transactions, also functions as a time-ordered index to the data stored in a Datomic database.

Datomic provides access to the transaction log directly via the tx-range function and from within query using the tx-ids and tx-data functions.

An example using the log in query is now available in the Day of Datomic repo. Our example database records the streets on which our three protagonists, John, Mary, and Joe, live.

Let’s find out when Joe moved to Broadway:

(d/q '[:find ?tx
       :in $ ?name ?street
       :where
       [?e :person/name ?name]
       [?e :person/street ?street ?tx true]]
     (d/history (d/db conn)) "Joe" "Broadway")

This query returns 13194139534317, the transaction ID of the transaction that asserted Joe’s street is Broadway. As in all Datomic databases, every transaction also records a timestamp, the :db/txInstant. Let’s see what wall-clock time is associated with this transaction entity:

(d/pull (d/db conn) '[:db/txInstant] 13194139534317)

So Joe moved to Broadway in 1983.

Issuing queries against the transaction log is a powerful approach for auditing the operational history of a database. Because every transaction is an entity in Datomic, we can easily retrieve the entire set of datoms for a given transaction entity. Let’s find out what else happened in the ‘Joe moves to Broadway’ transaction. This query returns all the datoms associated with the given transaction:

(d/q '[:find ?e ?a ?v ?tx ?op
       :in ?log ?tx
       :where [(tx-data ?log ?tx)[[?e ?a ?v _ ?op]]]]
     (d/log conn) 13194139534317)

;; result:
#{[17592186045420 64 "2nd" 13194139534317 true]
  [17592186045419 64 "Broadway" 13194139534317 true]
  [13194139534317 50 #inst"1983-01-01T00:00:00.000-00:00" 13194139534317 true]
  [17592186045420 64 "Elm" 13194139534317 false]
  [17592186045419 64 "1st" 13194139534317 false]}

Note that we see the same wall clock time we found previously as well as 4 other datoms. One is the assertion of Joe moving to Broadway, one is the retraction of his previous street (1st), and the remaining two datoms are about someone else entirely. Let’s find out who:

(d/pull (d/db conn) '[*] 17592186045420)

;; result:
{:db/id 17592186045420, :person/name "Mary", :person/street "2nd"}

By using the log, we’ve determined that Mary’s move to 2nd Street was recorded at the same time (during the same transaction) as Joe's move to Broadway.

The ability to query the Datomic transaction log directly is a powerful tool for managing, administering, and using a Datomic database. The addition of the Log API to memory databases enables low-overhead testing and evaluation of Datomic’s Log API feature. We hope you find the Log API on memory databases a helpful addition for lightweight development and unit testing.

Saturday, April 23, 2016

You Might Not Need an ORM

Over the last few months, my colleague Michael Nygard has been writing The New Normal series over on the Cognitect blog, arguing that our industry needs to embrace continuous partial failure and aim to build antifragile systems.

In his most recent installment, Mike shares some ideas on sharp tools and their value in eliminating code. Mike worked with several teams that considered object-relational mapping (ORM) for Datomic, concluding that with Datomic, you don't need ORM at all. Datomic lets you eliminate an entire architectural layer from your applications.