In this week’s episode, Lju and Alexander explore some playlist data from Resonate, based on example Cypher queries whatSocks has put together. Resonate is a co-ownership music streaming platform. Catch-up on the episode now!
This week we explored data from the Resonate music streaming service. We also did something a little differently! Normally, we use LOAD CSV to get data into our Neo4j Aura Free tier instance. However, this time around, we are access data via a REST API, and take advantage of apoc.load.json
. whatSocks has kindly provided an example of how to call the Resonate API in her repo, so that’s what we’re using this week.
This does mean we’re doing things slightly differently, in that the data model has already been defined, so we’ll run with that rather than defining our own. This is a very interesting data set, so we’ll surely be back, and we’ll come up with a new data model then!
Here are some examples of the data we worked with in the session:
As we mentioned above, we’re using the data model already provided by whatSocks. Once the data is loaded, you can view it by using the following command in Neo4j Browser: CALL db.schema.visualization()
.
For those of you who want to experiment with the model in arrows, you can view it here.
The following questions for the data came up during the stream:
-
What tag is most commonly used for a track?
-
What are the most popular songs across playlists (appearing at least twice)?
-
How similar are tags to each other (based on string name!)?
-
What tags appear per track list?
Here are the following queries we used from the repo to load our data.
Firstly, loading the track groups of type playlist. whatSocks will use the propety tracks_imported
later as part of a periodic commit as a flag to check what’s left to process
WITH 'https://api.resonate.coop/v2/' AS uri
CALL apoc.load.json(uri + 'trackgroups?type=playlist') // in this example, grabbing listener-generated playlists
YIELD value
UNWIND value["data"] AS data
MERGE (u:RUser {uuid:toString(data["user"]["id"])})
MERGE (t:TrackGroup {uuid:toString(data["id"])})
MERGE (u)-[:OWNS]->(t)
SET t.title = data["title"]
SET t.type = data["type"]
SET t.slug = data["slug"]
SET t.tracks_imported = false
Now add the rest of the TrackGroups
WITH 'https://api.resonate.coop/v2/' AS uri
CALL apoc.load.json(uri + 'trackgroups') // in this example, grabbing listener-generated playlists
YIELD value
UNWIND value["data"] AS data
MERGE (u:RUser {uuid:toString(data["user"]["id"])})
MERGE (t:TrackGroup {uuid:toString(data["id"])})
MERGE (u)-[:OWNS]->(t)
SET t.title = data["title"]
SET t.type = data["type"]
SET t.slug = data["slug"]
SET t.tracks_imported = false
Now adding the Tracks, using the tracks_imported
flag
CALL apoc.periodic.commit(
"MATCH (tg:TrackGroup)
WHERE NOT tg.tracks_imported
SET tg.tracks_imported = true
WITH tg limit $limit
WITH 'https://api.resonate.coop/v2/' AS uri, tg.uuid as tg_id
CALL apoc.load.json(uri + 'trackgroups/' + tg_id )
yield value
UNWIND value['data']['items'] AS items
MERGE (u:RUser {uuid:toString(items['track']['creator_id'])})
MERGE (track:Track {uuid:toString(items['track']['id'])})
MERGE (t)-[:HAS_TRACK]->(track)
MERGE (track)<-[:CREATED]-(u)
SET track.title = items['track']['title']
SET track.tags_imported = false
RETURN count(*)
",
{limit:10});
Last but not least, adding the tags
CALL apoc.periodic.commit(
"
MATCH (u:RUser)-[:CREATED]->(track:Track)
WHERE not u.uuid in ['7212','4315','4414'] // bad data
AND NOT track.tags_imported
SET track.tags_imported = true
WITH u AS artist, u.uuid AS user_id,
count(DISTINCT track) AS tracks,'https://api.resonate.coop/v2/' as uri
ORDER BY tracks desc
LIMIT $limit
CALL apoc.load.json(uri + 'artists/' + user_id + '/releases') // grabbing all
YIELD value
UNWIND value['data'] AS data
UNWIND data['tags'] AS tags
MERGE (t:TrackGroup {uuid:toString(data['id'])})
MERGE (user:RUser {uuid:toString(user_id)})-[:OWNS]->(t)
MERGE (tag:Tag {name:toLower(tags)})
MERGE (tag)<-[:HAS_TAG]-(t)
SET tag.uuid=apoc.create.uuid()
SET t.title = data['title']
SET t.type = data['type']
RETURN count(*)
",
{limit:10});
Question 1 - What tag is most commonly used for a track?
MATCH (t:Track)<-[:CREATED]-(:RUser)-[:OWNS]->(tg:TrackGroup)-[:HAS_TAG]->(tag:Tag)
WITH tag, count(t) AS freq
RETURN tag.name, freq ORDER BY freq DESC
Question 2 - What are the most popular songs across playlists (appearing at least twice)?
MATCH (t:Track)<-[:CREATED]-(:RUser)-[:OWNS]->(tg:TrackGroup)
WITH t.title AS title, count(tg) AS freq
WHERE freq > 1
RETURN title, freq ORDER BY freq DESC
Question 3 - How similar are tags to each other (based on string name!)?
For this one, we’re using the APOC text functions to help us - in this case, Sorensen Dice Similarity:
MATCH (t1:Tag), (t2:Tag)
//use internal node IDs to make sure we don't compare the same node, and compare the same node pairs twice :)
WHERE id(t1)>id(t2)
RETURN t1.name, t2.name, apoc.text.sorensenDiceSimilarity(t1.name, t2.name) AS score
ORDER BY score DESC
Question 4 - What tags appear per track list?
MATCH (tg:TrackGroup)-[:HAS_TAG]->(t:Tag)
RETURN tg.slug, collect(t.name)
We can take this much further, and start looking at intersections of tags - e.g. do we see the same tags commonly appearing together. Watch this space!