01 June 2013
MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public. We are pleased to release a sample project that uses the MusicBrainz dataset to help people get familiar with using Datomic.
The MusicBrainz dataset makes a great example database for learning, evaluating, or testing Datomic for a couple of reasons:
- It deals with a domain with which nearly everyone is familiar
- It is of decent size: 60,438 labels; 664,226 artists; 1,035,592 album releases; and 13,233,625 recorded tracks
- It comprises a good number of entities, attributes, and relationships
- It is fun to play with, query, and explore
The mbrainz-sample schema
is an adaptation of a subset of the full MusicBrainz schema
. We didn't include some entities, and we made some simplifying assumptions and combined some entities. In particular:
- We omit any notion of Work
- We combine Track, Tracklist and Recording into simply "track"
- We renamed Release group to "abstractRelease"
Abstract Release vs. Release vs. Medium
(Adapted from the MusicBrainz schema docs
An "abstractRelease" is an abstract "album" entity (e.g. "The Wall" by Pink Floyd). A "release" is something you can buy in your music store (e.g. the 1984 US vinyl release of "The Wall" by Columbia, as opposed to the 2000 US CD release by Capitol Records).
Therefore, when you query for releases e.g. by name, you may see duplicate releases. To find just the "work of art" level album entity, query for abstractRelease.
The media are the physical components comprising a release (disks, CDs, tapes, cartridges, piano rolls). One medium will have several tracks, and the total tracks across all media represent the track list of the release.
For information about the individual entities and their attributes, please see the schema
page in the wiki, or the EDN schema
First get Datomic
, and start up a transactor.
Getting the Data
Next download the mbrainz backup
# 2.8 GB, md5 4e7d254c77600e68e9dc71b1a2785c53
# this takes a while
tar -xvf datomic-mbrainz-backup-20130611.tar
Finally, restore the backup
# takes a while, but prints progress -- ~150,000 segments in restore
bin/datomic restore-db file:datomic-mbrainz-backup-20130611 datomic:free://localhost:4334/mbrainz
Getting the Code
Clone the git repo
git clone firstname.lastname@example.org:Datomic/mbrainz-sample.git
Running the examples
Fire up your favorite IDE, and configure it to use both the included pom.xml
and the following Java options when running:
Start up a Clojure REPL:
# from the root of the mbrainz-sample repo
Then connect to the database and run the queries
We would like to thank the MusicBrainz project for defining and compiling a great dataset, and for making it freely available.