Skip to content

Spatial Data Uploads

Chad Burt edited this page Jul 19, 2023 · 9 revisions

SeaSketch uses a system similar to Felt to enable spatial data uploads, converting them to pmtiles for storage and hosting.

What this system does

  • Accepts uploaded spatial data, informing the user of progress towards ingest into their SeaSketch project
  • Transforms it into cloud-native visualization formats such as PMTiles or plain geojson if small enough
  • Stores both the original upload and a "canonical" representation that can be converted to other forms to support data export and download
  • Extracts metadata from files to support metadata viewing (markdown) and styling (mapbox-geostats/tilejson)
  • Assigns a default cartographic style, ideally using metadata to pick from a set of appropriate templates
  • Monitors size of datasets uploaded and enforces a limit on upload size and total uploaded bytes on a per-project basis.
  • Keeps data private and only accessible from www.seasketch.org
  • Deletes data representations from cloud storage if deleted from a project

Upload processing

sequenceDiagram
participant Client
participant G as GraphQL API
participant D as Database
participant W as Graphile Worker
participant Lambda as AWS Lambda
participant R2 as Cloudflare R2
Client->>G: createDataUpload mutation
G->>D: creates record in data_uploads
G-->>Client: DataUpload with presignedUploadUrl
Client->>R2: Uploads spatial file directly to cloud storage using presigned url
Client->>G: Calls submitDataUpload mutation
G->>D: calls submit_data_upload fn 
D->>W: submit_data_upload triggers processDataUpload task 
W->>Lambda: processDataUpload updates progress and triggers the lambda
Lambda->>R2: Lambda fetches uploaded data from cloud storage
loop
    Lambda-->>D: Updates db with progress while processing
end
Lambda->>R2: Stores outputs (pmtiles, etc)
Lambda->>W: lambda calls processDataUploadOutputs worker task
W->>D: Creates data_layer, data_source, and table_of_contents_items
loop
    Client-->>G: DataUploadManager is uses GraphQL subscription to monitor task state
end
G-->>Client: receives upload status
Client->>G: Fetches new table of contents & displays layers
Loading

Resources Created

The upload system stores spatial data in Cloudflare R2 (previously AWS Cloudfront). It stores the original upload + several derivative resources depending on type. This includes:

  • The original upload (supported types are currently shapefile, geojson, flatgeobuf, and geotiff)
  • A canonical form that can be processed into data visualization products and other exports.
    • For vector this is flatgeobuf
    • For raster this is geotif
  • The visualization product. PMTiles for large vector and raster data, and geojson for < 1MB geojson files.

Each resource should go in a data_source_resources table which includes it's location, file size, and type. These "resources" are the primary output of the tiling lambda process.