Skip to content

Architecture

Dieter Debast edited this page Jul 24, 2017 · 1 revision

Architecture

The project consists of 3 main parts:

  1. C# geometrical comparison using NTS Topology Suite
  2. Django site with OpenLayers
  3. Python comparison script

Geometrical comparison

The comparator takes two GeoJSON input files and outputs the differences of these files, again as GeoJSON. It does this by drawing a buffer around one of the geometries and then taking the difference between this buffer and the other geometry. The buffer is necessary because two geometries will almost never be exactly on top of each other and in the context of routes, we can consider geometries that are approximately very close to each other as equal.

It's not perfect as it can still cause false positives or miss segments that are really close to the reference data but should be considered as different. Playing with the buffer radius could alleviate these problems, but a more complicated algorithm might be necessary to really fix these issues.

Django site

One app is present in the Django project, the 'BruMob' app. It's built around are specific use case of integrating the Brussels Mobility cycling network into OpenStreetMap, but reusing this app for other purposes should be fairly easy.

Django is the web framework that serves the website and GeoJSON files. OpenLayers is the JavaScript framework that provides a visual representation of the data.

Python comparison script

The main part of our webtool is the Python script that performs the automatic comparison between the reference data and OpenStreetMap data. It consists of multiple components chained together as a pipeline.

Pipeline

Scraper

Downloads the data from the reference source and the OpenStreetMap data using the corresponding bounding box via the Overpass API. This should result in a GeoJSON file of the cycling network and a OSM file of the OpenStreetMap dataset.

Reference Pre-processor

Processes the reference data:

  • Splits the cycling network into separate GeoJSON files for each route
  • Converts the properties to the corresponding OSM tags
  • Projects the coordinates to WGS84 (in this case from Lambert72)

OSM Processor

Parses the OSM data using the PyOsmium library to extract the route relations corresponding to the reference data. The component will also extract the ways and nodes of the relations so the coordinates of the nodes can be used to generate a GeoJSON file for each route.

Some basic assumptions have to be made about the relations because we don't want to consider relations that have nothing to do with our cycling network. The tags that we assume to present and correct are:

Key Value (our use case)
type route
route bicycle
network lcn
operator Brussels Mobility
ref present in reference data

Each route relation that we find is outputted as a GeoJSON file.

Metadata

Compares the properties of the reference data and OSM data. The correct tags are already added by the Reference Pre-processor, so this is just a basic comparison. For each issue, missing or wrong tag, an explanatory message is generated and placed into a semicolon separated string to the properties of that route, with the key tagging_errors. There's also a key-value pair added: error_type = tagging, which explains what kind of error this GeoJSON file represents.

In case the OSM route is not present and no metadata can be compared, the reference route is copied and tagged with error_type = missing.

Difference

Calls the C# program to compare the routes geometrically. It is called two times, first to check for missing segments in the OSM data, and secondly to check if the OSM data contains segments that shouldn't be there. This will result in two GeoJSON files for each route.

Post-processor

Checks all the created data files and then outputs one GeoJSON file containing the final output of our script.

  • If the route was missing from OpenStreetMap, then the reference data is outputted with the corresponding OSM tags and a error_type = missing key-value pair.
  • If the route contains geometrical issues, the difference features are combined into one GeoJSON file with error_type = difference and difference_type = missing or difference_type = wrong. It also copies the properties from the metadata component, as the C# program doesn't output any metadata.
  • Otherwise we copy the file outputted by the metadata component, which may or may not contain tagging issues.

This will result in one GeoJSON file containing the output of the script. Note: a route might be marked as missing because the OSM data didn't have the correct tags (see OSM Processor).

The reference routes, script output and OSM routes combined into one network are also copied to the correct Django static folder to be served on the web server.

Main

Run python main.py to run the complete Python script. It will clean the data folders and start the pipeline. One parameter can be given to skip the cleaning and scraping of the OpenStreetMap data, as it can be quite large. By default it cleans the OSM data, but the script can be run with python main.py false to skip the process.

Reusing

Reusing this Python script might require more work depending on the format of the data, especially if it's a node cycling network. The files that will require the most work is the constants.py file, so it contains the correct URLs and tags, and the Reference Pre-processor so it is adapted to the desired data.