Skip to content

Latest commit

 

History

History
102 lines (71 loc) · 9 KB

week-5.adoc

File metadata and controls

102 lines (71 loc) · 9 KB

Week 5 - Rest areas across New York state data set

In this week’s episode, Lju and Alexander explore data on rest areas across New York State, from the U.S. Government open data site. Catch-up on the episode now!

maxresdefault
Week 5 catch-up video

The data set

This week we explored rest areas available in New York State data, including location, name and facilities available.

Whilst it’s a relatively small data set (some 32 rows!), it is a rather fun data set for us to explore. It also gives us a great opportunity to have a play with Neo4j spatial.

The questions

The following questions for the data set came up during the stream:

  • If I’m traveling on a certain route in a certain direction, what rest stops are available?

  • What rest areas closest to me has a public phone?

  • What route has the most handicap spaces available?

The data model

You can view and play with the data model in Arrows App here.

We have used more than one label on a node! Looking at the data set, we could see that some rest areas were open, and others were closed. A convenient way to filter out open rest stops is to apply a label - this allows the query engine to quickly filter out nodes without looking at properties.

We’ve not loaded all of the available properties. By all means, if you wish to do a bit of exploring, please do!

img\ranys
Figure 1. The rest areas in NY State data model we used in the stream

Loading the data

To load the data into the database, we use the following query:

LOAD CSV WITH HEADERS FROM "https://data.ny.gov/api/views/qebf-4fd8/rows.csv?accessType=DOWNLOAD" AS row
MERGE (c:County {name:row.County})
MERGE (r:Route {name:row.Route})
CREATE (rs:RestStop {name:row.Name, description:tointeger(row.Description),
    publicPhone:tointeger(row.`Public Phone`), handicapSpaces:tointeger(row.`Handicapped Spaces`),
    truckSpaces:tointeger(row.`Truck Spaces`), carSpaces:tointeger(row.`Car Spaces`),
    location:point({latitude:tofloat(row.Latitude), longitude:tofloat(row.Longitude)})})
CREATE (rs)-[:ON_ROUTE {direction:row.`Travel Direction`}]->(r)
CREATE (rs)-[:IN_COUNTY]->(c)
WITH rs, row.Status AS status WHERE status = "Open"
SET rs:Open

As before, we need to cast lots of the numbers into integers and floats as appropriate! We didn’t do this on the live stream, we’ve made the changes in this write up.

Another interesting thing about this data set, is that we get to use the Neo4j spatial types! You can read more about them here.

Querying the data

Question 1 - If I’m traveling on a certain route in a certain direction, what rest stops are available with truck parking?

WITH 'I-87' as route, 'Southbound' as direction
MATCH (r:Route{name:route})-[:ON_ROUTE {direction:direction}]-(rs:RestStop:Open)
WHERE rs.truckSpaces > 0
RETURN rs.name, rs.truckSpaces,  rs.location.y, rs.location.x ORDER BY rs.location.y DESC

In this query, we are using the distance function from Neo4j spatial.

Question 2 - What rest areas closest to me has a public phone?

We quickly discovered that none of the rest areas had public telephones! so we switched it over from telephones to handicap spaces.

Here’s the random location in New York state we picked as the location to work with.

WITH point({latitude:43.102700,longitude:-75.232339}) as location
MATCH (rs:RestStop:Open) WHERE rs.handicapSpaces>0
RETURN rs.name, rs.location.y+","+rs.location.x, distance(rs.location, location) AS dist
    ORDER BY dist

Question 3 - What route has the most handicap spaces available?

MATCH (rs:RestStop:Open)-[:ON_ROUTE]->(r:Route)
WHERE rs.handicapSpaces>0
WITH r, collect(rs) AS stops  ORDER BY size(stops) DESC LIMIT 1
RETURN r.name AS `Route Name`, size(stops) AS `No. Stops`