-
Notifications
You must be signed in to change notification settings - Fork 325
Truth Data
This page contains information for developers interested in how the Forecast Hub truth data are updated and validated.
The automated GitHub Action updates the truth weekly at 12pm on Sundays. The configuration for the workflow can be found here. This workflow calls multiple packages and their functions, as well as stand alone scripts to generate multiple truth data files to be consumed by different endpoints:
-
Deaths and Cases truths: We use the covidData package to get the most recent time series data for COVID-19 from the JHU data repository, then use the preprocess_jhu() method in the
covidHubUtils
package to transform these data into CSVstruth-Cumulative Deaths.csv
,truth-Incident Deaths.csv
,truth-Cumulative Cases.csv
andtruth-Incident Cases.csv
. -
Hospitalization truths: We use the covidData package to get the most recent time series data for hospitalization, then use the preprocess_hospitalization() method in the
covidHubUtils
package to transform these data into CSVstruth-Cumulative Hospitalizations.csv
andtruth-Incident Hospitalizations.csv
. -
Visualization truth:
get_visualization_truth_json_from_csv.py
is the script used to generate the JSON truth file from the CSVs, so they can be consumed by the visualization. Here, the Incidence forecasts are lower bounded to 0. -
Zoltar truth: save_truth_for_zoltar is the method in
covidHubUtils
used for generating the truth data for Zoltar. Here, the Incidence forecasts are not lower bounded to 0.
This active workflow is responsible for running the truth update weekly, and can be manually activated as well if triggered. The configuration for this workflow is defined here
This deprecated workflow is the previous version used, which did not unit test the truth data before aggregation, but it can still be triggered manually if needed. The configuration for this workflow is defined here
The JHU truth data is unit-tested through 2 phases
- Tests in covidData to ensure that package is faithful to the raw source data at the county level, and that aggregation is done correctly.
- Tests in covidHubUtils to make sure the functions are faithful to the covidData outputs, and that we have all the correct locations, timezeros, etc as required for the specific file
- Home
- Submitting Forecasts
- Data Validation
- Truth Data
- Baseline model
- Weekly ensemble release
- Developer