Skip to content

Commit

Permalink
Merge pull request #62 from openactive/more-docs
Browse files Browse the repository at this point in the history
docs: details on spider errors and validate raw data
  • Loading branch information
James (ODSC) authored Jul 16, 2020
2 parents e9f6cff + 4579cfe commit c055722
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 2 deletions.
10 changes: 10 additions & 0 deletions docs/stage/spider-data-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,13 @@ It will store the results of this in the `publisher` and `publisher_feed` tables
To run this:

`$ node ./src/bin/spider-data-catalog.js`


## Errors

Any errors encountered during this stage will be stored in the `spider_data_catalog_error` table.

* `url` - Where the error occurred
* `error` - What the error was
* `found_via` - How we got to this URL. Which data catalogs did we go throught to find this URL?
* `error_at` - What date and time the error occurred
17 changes: 15 additions & 2 deletions docs/stage/validate-raw-data.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,20 @@
# Validate Raw Data

This will validate the raw data using the standard https://github.com/openactive/data-model-validator library
and save the results back in the database in the`raw_data` table.
This will validate the raw data using the standard https://github.com/openactive/data-model-validator library.

It will only pay attention to errors of `severity` == `failure`. All other errors of lesser severity are just discarded.

It will save the results back in the database in the`raw_data` table.

* `validation_done`- Boolean; has this data been validated?
* `validation_results` - JSON - contains details on the errors, if there were any.
* `validation_passed` - Boolean; did validation pass?
Technically this can be calculated by checking `validation_done` and the JSON in `validation_results` but this makes it very easy to calculate statistics.

Every time a piece of raw data is updated or deleted by a publisher's RPDE feed, these variables are reset. In this way the validation results in those columns:

* Are always for the latest data.
* Will be recalculated every time a piece of data changes.

To run this:

Expand Down

0 comments on commit c055722

Please sign in to comment.