Phase II Ideas and Work Plan #39

jjnesbitt · 2024-05-10T21:33:25Z

jjnesbitt
May 10, 2024
Maintainer

This is a both a summary and extension of our meeting on 05/10. It includes things that I think we need to do in order to lay the ground work for what we hope to achieve going forward, as well as some starting thoughts and idea. Once we settle on this, I can break this out into a milestone and issues. Please take a look @aashish24 @johnkit @annehaley, and let me know your thoughts/suggestions.

Data Collection

Network Types

We need to make sure we can collection and transform data into formats that we can work with and are also usable. The work and analysis we can do will depend on the network type, of which to me it seems there are these categories:

Transportation (car, bike, rail, subway, pedestrian, etc.)
Electrical Grid (power lines, transformer stations, substations, power plants, etc.)
Communication (telephone, cell towers, internet, etc.)

There may be more, but these are the ones that seems to immediately stand out to us. Much of the following things will be most directly applicable to transportation networks, but not exclusively.

When it comes to transportation network data like roads and streets, OpenStreetMap and specifically osmnx is probably sufficient for that. When it comes to pedestrian networks (sidewalks, crosswalks, footpaths, etc.), I'm not sure how well it performs. We may need to use a hybrid approach between that and tile2net, in order to get the data we'd want and need.

Edges are only half of the picture, we'll also want destination data. OSMnx seems to be able to handle that nicely as well, providing a "feature" API, which allows us to query for urban amenities within a particular region (restaurants, hospitals, etc.), and correlate them with the transportation networks.

For these use cases where we can use OSMinx or tile2net for automated data retrieval, it makes sense to set up a system where that data is fetched and transformed on-demand. This can be done on the level of one MGRS grid square (100 km²), so that whenever data from one of these grid squares is fetched, the entire grid square is fetched and saved for future use.

For other transportation networks like rail, and for other network types like electrical and communication, I'm not sure how we'll get that data. Data collection of those sources may be more of a manual process for now. Nevertheless, we'll want to devise a standardized format for each network sub-type, to ensure consistency.

Data Organization

We need to determine the data formats and tools needed for our use cases. The main questions we need to answer are the following:

What type of analysis/queries will we be doing?
What types of database make sense for storing these networks?

The answer to the first question may inform that of the second question. If we intend to ask large and complex questions about the data we're storing in our database, then it makes sense to choose a database that is able to run those queries fast and efficiently. Since we're asking questions about graphs, I think it makes sense to use a graph database to store this information. The most popular graph database at the moment is neo4j, which I think is worth considering. However, we'll undoubtedly have non-graph relational data that we'd like to store, so I think we'll likely want a hybrid approach between these databases. If organized correctly, this would allow us to take advantage of the native capabilities of Postgres, along with the additional usefulness of both PostGIS and neo4j. The Apache AGE project aims to bring the functionality of graph databases to Postgres directly, which could also be considered.

UI Features

Based on our meeting it seems we have a desire for the following features

Ability to select a region of interest, pull data from that region, and do further analysis.
Ability to "browse" the catalog of network data we have that can't be automatically retrieved (as of yet), like rail, subway, electrical grid, communications, etc. This could be done using a map to highlight areas where we have the most data.
Ability to show types of destinations in that ROI and visualize how they become inaccessible when removing nodes from the graph.
1. i.e. select “hospitals”, and then when you remove a node you can see how which / how many hospitals become unavailable.
Ability to run flood simulations on these ROIs, which would essentially automate the removal of nodes in a structured way, based on a specific situation. In this case it's flooding, but there could be other simulation types as well.

Destination Accessibility

This is something we did not discuss but that I thought about after the fact, and thought I'd bring up here. It may be useful to be able to show the "accessibility" of particular types of destinations in a ROI. For example, you might want to select an area and ask the question “how easy to reach are the hospitals in this area”? This would be achieved by making queries to Neo4j to determine the susceptibility of specific destination types (hospitals in this case) to road closes. If you ran this example somewhere and the user noticed that a particular hospital has a low “accessibility score”, they could see that it’s due to a particular road that if closed, very little people can reach it.

This approach could also be applied to other, non-emergency situations. For example, you might want to know how reachable the grocery stores in a particular area are, broken down by transportation categories like car, bike, rail, and pedestrian. If you were looking to improve the urban infrastructure of a particular area, this would give you some basis to improve upon (e.g. showing destinations that are very unreachable by foot, but are usually reachable by car).

johnkit · 2024-05-17T21:06:40Z

johnkit
May 17, 2024
Maintainer

Thanks for outlining all of this, Jake. Some reactions/comments:

In case anyone needs it, the Phase II proposal is at https://drive.google.com/file/d/1ta79zvJuXD-dGqD9ugaOZBkjznkYh8Il/view?usp=sharing
The main focus on UVDAT is to quantify the risk/resilience of "lifeline systems" impacted by changing climate. So in terms of transportation, I would defer any focus on foot or bike paths and tile2net at least until the second year of the project.
What little I know about OSMnx suggest that it has a very well-designed internal graph representation, my guess some spiffy logic encapsulating NetworkX. It might be useful to explore this code to see if there is anything we might be able to apply in UVDAT for network operations.
On the database side, I vote to explore Apache AGE to extend PostgreSQL first, perhaps as an "excursion" effort to assess its efficacy, before undertaking any prototyping with Neo4j (which I presume would include the Neo4j Spatial plugin).
Regarding electrical networks, it won't surprise anyone to know that (i) they are extremely complex compared to anything we've touched so far, and (ii) there must be a ton of network management software already developed by utilities and "system operators", including software to evaluate risk/resilience. Maybe we should explore a scenario where UVDAT produces extreme climate predictions that utilities/operators could use for their own analysis? We should ask our friends at ecoLong if this is a viable strategy and if they can help with this.
Side note: This also reminds me that we might be adding a new data type/model to the UVDAT database for climate predictions produced by the Northeastern-trained YNet. (These could be generated ad hoc, but my guess is we should cache them in the DB.) The exact data format is TBD -- their intent is to generate "predictions" for 4km grid cells. Each prediction is small -- 3 floating point numbers per variable (iirc). And there would be a small number of predicted variables, perhaps just precipitation and maybe wind speed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase II Ideas and Work Plan #39

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Phase II Ideas and Work Plan #39

jjnesbitt May 10, 2024 Maintainer

Data Collection

Network Types

Data Organization

UI Features

Destination Accessibility

Replies: 1 comment

johnkit May 17, 2024 Maintainer

jjnesbitt
May 10, 2024
Maintainer

johnkit
May 17, 2024
Maintainer