01 June 2013
MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public. We are pleased to release a sample project that uses the MusicBrainz dataset to help people get familiar with using Datomic.
The MusicBrainz dataset makes a great example database for learning, evaluating, or testing Datomic for a couple of reasons:
- It deals with a domain with which nearly everyone is familiar
- It is of decent size: 60,438 labels; 664,226 artists; 1,035,592 album releases; and 13,233,625 recorded tracks
- It comprises a good number of entities, attributes, and relationships
- It is fun to play with, query, and explore
Schema
The
mbrainz-sample schema is an adaptation of a subset of the full
MusicBrainz schema. We didn't include some entities, and we made some simplifying assumptions and combined some entities. In particular:
- We omit any notion of Work
- We combine Track, Tracklist and Recording into simply "track"
- We renamed Release group to "abstractRelease"
Abstract Release vs. Release vs. Medium
(Adapted from the MusicBrainz
schema docs)
An "abstractRelease" is an abstract "album" entity (e.g. "The Wall" by Pink Floyd). A "release" is something you can buy in your music store (e.g. the 1984 US vinyl release of "The Wall" by Columbia, as opposed to the 2000 US CD release by Capitol Records).
Therefore, when you query for releases e.g. by name, you may see duplicate releases. To find just the "work of art" level album entity, query for abstractRelease.
The media are the physical components comprising a release (disks, CDs, tapes, cartridges, piano rolls). One medium will have several tracks, and the total tracks across all media represent the track list of the release.
Relationship Diagram
Entities
For information about the individual entities and their attributes, please see the
schema page in the wiki, or the
EDN schema itself.
Getting Started
First
get Datomic, and start up a transactor.
Getting the Data
Next download the
mbrainz backup:
# 2.8 GB, md5 4e7d254c77600e68e9dc71b1a2785c53
wget http://s3.amazonaws.com/mbrainz/datomic-mbrainz-backup-20130611.tar
and extract:
# this takes a while
tar -xvf datomic-mbrainz-backup-20130611.tar
Finally,
restore the backup:
# takes a while, but prints progress -- ~150,000 segments in restore
bin/datomic restore-db file:datomic-mbrainz-backup-20130611 datomic:free://localhost:4334/mbrainz
Getting the Code
Clone the
git repo somewhere convenient:
git clone git@github.com:Datomic/mbrainz-sample.git
cd mbrainz-sample
Running the examples
From Java
Fire up your favorite IDE, and configure it to use both the included
pom.xml and the following Java options when running:
-Xmx2g -server
From Clojure
Start up a Clojure REPL:
# from the root of the mbrainz-sample repo
lein repl
Then
connect to the database and run the queries.
Thanks
We would like to thank the MusicBrainz project for defining and compiling a great dataset, and for making it freely available.