forked from seankross/bookdown-start
-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathSubmission-Instructions.qmd
172 lines (112 loc) · 10.4 KB
/
Submission-Instructions.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
# Submission Instructions {#sec-submissions}
The following provides the requirements for the format of the forecasts that will be submitted. It is important to follow these format guidelines in order for your submitted forecasts to pass a set of internal checks that allow the forecast to be visualized on the [NEON Ecological Forecast Challenge dashboard](http://shiny.ecoforecast.org/){target="_blank"} and evaluated with the scoring process.
## Steps to submitting
We provide an overview of the steps for submitting with the details below:
1) Submit [registration and model overview](https://forms.gle/kg2Vkpho9BoMXSy57){target="_blank"} for the model using a web form.
2) Generate forecast with required columns. There are three options for the file format described below.\
3) Write forecast to a file that follows the standardized naming format.\
4) Submit forecast (preferably using the submission function that we provide)/\
## Step 1: Forecast file format
There are two key options for the format. First, the file can be either in a csv or a netcdf format. Second the file can represent uncertainty using a `family` and `parameter` column.
For an ensemble (or sample) forecast, the `family` column uses the word `ensemble` to design that it is a ensemble forecast and the parameter column is the ensemble member number (`1`, `2`, `3` ...)
For a parametric forecast, the `family` column uses the word `normal` to designate a normal distribution and the parameter column must have values of `mu` and `sigma` for each forecasted variable, site_id, and time.
If you are submitting a forecast that does not have a normal distribution we recommend using the ensemble format and sampling from your non-normal distribution to generate a set of ensemble members that represents your distribution.
We aim to support more distributions beyond the normal in the distribution format file (e.g., log-normal distribution to support zero-bounded forecast submissions). Please check back at this site for updates on the list of supported distributions.
### CSV
- `datetime`: forecast timestamp. Format is `%Y-%m-%d %H:%M:%S` for terrestrial_30min theme and `%Y-%m-%d` for all other themes.\
- `reference_datetime`: The start of the forecast; this should be 0 times steps in the future. This should only be one value of `reference_datetime` in the file. Format is `%Y-%m-%d %H:%M:%S` for terrestrial_30min theme and `%Y-%m-%d` for all other themes.
- `site_id`: NEON code for site
- `family` name of probability distribution that is described by the parameter values in the parameter column; only `normal` or `ensemble` are currently allowed.
- `parameter` required to be the string `mu` or `sigma` (see note below about parameter column) or the number of the ensemble member.
- `variable`: standardized variable name for the theme
- `prediction`: forecasted value for parameter in the parameter column
- `model_id`: the short name of model defined as the model_id in the file name (see below) and in your metadata. The model_id should have no spaces.
The time unit (i.e., date or date-time) should correspond to the time unit of the theme specific target file (see Theme description). All datetimes with time zones should use UTC as the time zone.
Here is an example of a forecast that uses a normal distribution:
```{r}
readr::read_csv("https://data.ecoforecast.org/neon4cast-forecasts/raw/aquatics/aquatics-2022-11-07-climatology.csv.gz", show_col_types = FALSE)
```
Here is an example of a forecast that uses ensembles:
```{r}
readr::read_csv("https://data.ecoforecast.org/neon4cast-forecasts/raw/aquatics/aquatics-2022-11-07-persistenceRW.csv.gz", show_col_types = FALSE)
```
### Netcdf
A netcdf should have the following variables
- `datetime`: The start of the forecast
- `site_id`: NEON code for site
- `parameter` integer value for forecast replicate member (i.e. ensemble member or MCMC sample);
and include additional variables with names that correspond to the theme-specific standardized variables. For example, a netcdf from the aquatics theme would have `temperature`, `oxygen`, and `chla` variables.
The order of dimensions on forecast variables should be: `time`, `site`, and `parameter`. The netcdf format should only be used for ensemble-based forecasts.
The time unit (i.e., date or date-time) should correspond to the time unit of the theme specific target file (see Theme description)
A netcdf should have the following global attributes:
- the `reference_datetime` variable, which is the start of the forecast (should be 0 times steps in the future).
- the `model_id`, the short name of model defined as the model_id in the file name (see below) and in your metadata. The model_id should have no spaces.
<!-- -->
```
netcdf terrestrial_30min-2022-10-07-climatology {
dimensions:
datetime = 1681 ;
site = 47 ;
parameter = 200 ;
nchar = 4 ;
variables:
double datetime(datetime) ;
datetime:units = "seconds since 2022-10-07" ;
datetime:long_name = "datetime" ;
int site(site) ;
site:long_name = "NEON siteID" ;
int parameter(parameter) ;
parameter:long_name = "ensemble member" ;
double nee(parameter, site, datetime) ;
nee:units = "umol CO2 m-2 s-1" ;
nee:_FillValue = 1.e+32 ;
nee:long_name = "net ecosystem exchange of CO2" ;
double le(parameter, site, datetime) ;
le:units = "W m-2" ;
le:_FillValue = 1.e+32 ;
le:long_name = "latent heat flux" ;
char site_id(site, nchar) ;
site_id:long_name = "NEON site codes" ;
// global attributes:
:reference_datetime = "2022-10-07 00:00:00" ;
:model_id = "climatology" ;
}
```
## Step 2: File name
**the correct naming convention is critical for the automated processing of submissions**
Teams will submit their forecasts as a single netCDF or csv file with the following naming convention:
`theme_name-year-month-day-model_id.csv` (or `.nc` if a netcdf). Compressed csv files with the csv.gz extension are accepted. R will automatically compress if you add the `csv.gz` as the extension of your file name when running `write.csv` or `write_csv`.
1) The `theme_name` options are: `terrestrial_daily`, `terrestrial_30min`, `aquatics`, `beetles`, `ticks`, or `phenology`.\
2) The `year`, `month`, and `day` are the year, month, and day the `reference_datetime` (horizon = 0). For example, if a forecast starts today and tomorrow is the first forecasted day, horizon = 0 would be today, and used in the file name.\
3) `model_id` is the code for the model name that you specified in the model metadata Google Form (`model_id` has no spaces in it).\
## Step 3: Metadata format
For multi-model analysis, it is important to provide a description of your forecasting approach. There is a required f orm and an optional EML metadata
1) Required: Complete the registration/model overview form that describes your model and provide your `model_id`. You need to fill out the form once per model_id. If you revise your modeling approach associated with a `model_id` then you will need to resubmit the form. A revised model does not change the `project_id`
2) Optional: You can submit an xml file that provides metadata that follow the proposed EFI standards. If you chose to submit metadata will your forecast submission, the metadata file must have the same name as the submitted forecast, but with the .xml extension (`theme_name-year-month-day-model_id.xml`). Metadata files should be uploaded with the forecast files. The [metadata standard](https://github.com/eco4cast/EFIstandards){target="_blank"} has been designed by the Ecological Forecasting Initiative and is built off the widely used Ecological Metadata Language (EML). A guide for completing the metadata is in @sec-metadata.
## Step 4: Submission process
Individual forecast (csv, csv.gz, netCDF) files can be uploaded any time.
**The correct file name and format is critical for the automated processing of submissions**
Teams will submit their forecast netCDF or csv files through an R function.
We have developed a function called `submit()` in the `neon4cast` package handles submission process.
```{r eval = FALSE}
neon4cast::submit(forecast_file = "theme_name-forecast-year-month-day-model_id.csv")
```
Alternatively, if you using another programming language, you can submit using AWS S3-like tools (i.e., `aws.s3` R package) to the `neon4cast-submissions` bucket at the `data.ecoforecast.org` endpoint.
Submissions need to adhere to the forecast format that is provided above, including the file naming convention. Our cyberinfastructure automatically evaluates forecasts and relies on the expected formatting. Contact eco4cast.initiative\@gmail.com if you experience technical issues with submitting.
**Note:** If you have used AWS in the past you might have credential files in an .aws folder in your home directory that will cause an error when you try to upload to a non-AWS bucket. If you encounter this error you may need to rename your credentials files so put_object doesn't try to read them.
## Validating submission
You can check the status of your submission using the following function in the `neon4cast` package
```{r eval = FALSE}
neon4cast::check_submission("phenology-2022-02-07-persistence.nc")
```
A successful submission can be found at the following links within 2 hours of submissions
We run a validator script when processing the submissions. If your submission does not meet the file standards above, you can run a function that provides information describing potential issues. The forecast file needs to be in your local working directory or you need to provide a full path to the file
```{r eval = FALSE}
neon4cast::forecast_output_validator("phenology-2022-09-01-persistenceRW.csv.gz")
```
## Visualizing submissions
Plots of submissions and table of scores can be found at our [dashboard](https://projects.ecoforecast.org/neon4cast-dashboard){target="_blank"}
## Video Describing How to Submit to the Challenge
This video was recorded for the [2021 Early Career Annual Meeting](https://ecoforecast.org/ecological-forecasting-early-career-annual-meeting/){target="_blank"}
<iframe width="560" height="315" src="https://www.youtube.com/embed/S8x5rLtltDU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen>
</iframe>