Skip to content

Latest commit

 

History

History
181 lines (143 loc) · 11.5 KB

week-4.adoc

File metadata and controls

181 lines (143 loc) · 11.5 KB

Week 4 - The Resonate music data api

In this week’s episode, Lju and Alexander explore some playlist data from Resonate, based on example Cypher queries whatSocks has put together. Resonate is a co-ownership music streaming platform. Catch-up on the episode now!

maxresdefault
Week 4 catch-up video

The data set

This week we explored data from the Resonate music streaming service. We also did something a little differently! Normally, we use LOAD CSV to get data into our Neo4j Aura Free tier instance. However, this time around, we are access data via a REST API, and take advantage of apoc.load.json. whatSocks has kindly provided an example of how to call the Resonate API in her repo, so that’s what we’re using this week.

This does mean we’re doing things slightly differently, in that the data model has already been defined, so we’ll run with that rather than defining our own. This is a very interesting data set, so we’ll surely be back, and we’ll come up with a new data model then!

Here are some examples of the data we worked with in the session:

The data model

As we mentioned above, we’re using the data model already provided by whatSocks. Once the data is loaded, you can view it by using the following command in Neo4j Browser: CALL db.schema.visualization().

For those of you who want to experiment with the model in arrows, you can view it here.

img\Resonatemodel
Figure 1. The Resonate music platform data model

The questions

The following questions for the data came up during the stream:

  • What tag is most commonly used for a track?

  • What are the most popular songs across playlists (appearing at least twice)?

  • How similar are tags to each other (based on string name!)?

  • What tags appear per track list?

Loading the data

Here are the following queries we used from the repo to load our data.

Firstly, loading the track groups of type playlist. whatSocks will use the propety tracks_imported later as part of a periodic commit as a flag to check what’s left to process

WITH 'https://api.resonate.coop/v2/' AS uri
CALL apoc.load.json(uri + 'trackgroups?type=playlist') // in this example, grabbing listener-generated playlists
YIELD value
UNWIND value["data"] AS data
MERGE (u:RUser {uuid:toString(data["user"]["id"])})
MERGE (t:TrackGroup {uuid:toString(data["id"])})
MERGE (u)-[:OWNS]->(t)
SET t.title = data["title"]
SET t.type = data["type"]
SET t.slug = data["slug"]
SET t.tracks_imported = false

Now add the rest of the TrackGroups

WITH 'https://api.resonate.coop/v2/' AS uri
CALL apoc.load.json(uri + 'trackgroups') // in this example, grabbing listener-generated playlists
YIELD value
UNWIND value["data"] AS data
MERGE (u:RUser {uuid:toString(data["user"]["id"])})
MERGE (t:TrackGroup {uuid:toString(data["id"])})
MERGE (u)-[:OWNS]->(t)
SET t.title = data["title"]
SET t.type = data["type"]
SET t.slug = data["slug"]
SET t.tracks_imported = false

Now adding the Tracks, using the tracks_imported flag

CALL apoc.periodic.commit(
"MATCH (tg:TrackGroup)
WHERE NOT tg.tracks_imported
SET tg.tracks_imported = true
WITH tg limit $limit
WITH 'https://api.resonate.coop/v2/' AS uri, tg.uuid as tg_id
CALL apoc.load.json(uri + 'trackgroups/' + tg_id )
yield value
UNWIND value['data']['items'] AS items
MERGE (u:RUser {uuid:toString(items['track']['creator_id'])})
MERGE (track:Track {uuid:toString(items['track']['id'])})
MERGE (t)-[:HAS_TRACK]->(track)
MERGE (track)<-[:CREATED]-(u)
SET track.title = items['track']['title']
SET track.tags_imported = false
RETURN count(*)
",
{limit:10});

Last but not least, adding the tags

CALL apoc.periodic.commit(
"
MATCH (u:RUser)-[:CREATED]->(track:Track)
WHERE not u.uuid  in ['7212','4315','4414'] // bad data
AND NOT track.tags_imported
SET track.tags_imported = true
WITH u AS artist, u.uuid AS user_id,
    count(DISTINCT track) AS tracks,'https://api.resonate.coop/v2/' as uri
ORDER BY tracks desc
LIMIT $limit
CALL apoc.load.json(uri + 'artists/' + user_id + '/releases') // grabbing all
YIELD value
UNWIND value['data'] AS data
UNWIND data['tags'] AS tags
MERGE (t:TrackGroup {uuid:toString(data['id'])})
MERGE (user:RUser {uuid:toString(user_id)})-[:OWNS]->(t)
MERGE (tag:Tag {name:toLower(tags)})
MERGE (tag)<-[:HAS_TAG]-(t)
SET tag.uuid=apoc.create.uuid()
SET t.title = data['title']
SET t.type = data['type']
RETURN count(*)
",
{limit:10});

Querying the data

Question 1 - What tag is most commonly used for a track?

MATCH (t:Track)<-[:CREATED]-(:RUser)-[:OWNS]->(tg:TrackGroup)-[:HAS_TAG]->(tag:Tag)
WITH tag, count(t) AS freq
RETURN tag.name, freq ORDER BY freq DESC

Question 2 - What are the most popular songs across playlists (appearing at least twice)?

MATCH (t:Track)<-[:CREATED]-(:RUser)-[:OWNS]->(tg:TrackGroup)
WITH t.title AS title, count(tg) AS freq
    WHERE freq > 1
RETURN title, freq ORDER BY freq DESC

Question 3 - How similar are tags to each other (based on string name!)?

For this one, we’re using the APOC text functions to help us - in this case, Sorensen Dice Similarity:

MATCH (t1:Tag), (t2:Tag)
//use internal node IDs to make sure we don't compare the same node, and compare the same node pairs twice :)
WHERE id(t1)>id(t2)
RETURN t1.name, t2.name, apoc.text.sorensenDiceSimilarity(t1.name, t2.name) AS score
    ORDER BY score DESC

Question 4 - What tags appear per track list?

MATCH (tg:TrackGroup)-[:HAS_TAG]->(t:Tag)
RETURN tg.slug, collect(t.name)

We can take this much further, and start looking at intersections of tags - e.g. do we see the same tags commonly appearing together. Watch this space!