-
-
Notifications
You must be signed in to change notification settings - Fork 56
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #23 from openclimatefix/issue/make-testset
Basic Evaluation
- Loading branch information
Showing
23 changed files
with
3,529 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,16 +4,17 @@ | |
<!-- ALL-CONTRIBUTORS-BADGE:END --> | ||
|
||
The aim of the project is to build an open source PV forecast that is free and easy to use. | ||
The forecast provides the expected generation in `kw` for 0 to 48 hours for a single PV site. | ||
|
||
Open Climate Fix also provide a commercial PV forecast, please get in touch at [email protected] | ||
|
||
The current model uses GFS or ICON NWPs to predict the solar generation at a site | ||
|
||
|
||
```python | ||
from quartz_solar_forecast.forecast import run_forecast | ||
from quartz_solar_forecast.pydantic_models import PVSite | ||
|
||
# make input data | ||
# make a pv site object | ||
site = PVSite(latitude=51.75, longitude=-1.25, capacity_kwp=1.25) | ||
|
||
# run model, uses ICON NWP data by default | ||
|
@@ -50,6 +51,38 @@ The 9 NWP variables, from Open-Meteo documentation, are mentioned above with the | |
- The model is trained on [UK MetOffice](https://www.metoffice.gov.uk/services/data/met-office-weather-datahub) NWPs, but when running inference we use [GFS](https://www.ncei.noaa.gov/products/weather-climate-models/global-forecast) data from [Open-meteo](https://open-meteo.com/). The differences between GFS and UK MetOffice, could led to some odd behaviours. | ||
- It looks like the GFS data on Open-Meteo is only available for free for the last 3 months. | ||
|
||
## Evaluation | ||
|
||
To evaluate the model we use the [UK PV](https://huggingface.co/datasets/openclimatefix/uk_pv) dataset and the [ICON NWP](https://huggingface.co/datasets/openclimatefix/dwd-icon-eu) dataset. | ||
All the data is publicly available and the evaluation script can be run with the following command | ||
|
||
```bash | ||
python scripts/run_evaluation.py | ||
``` | ||
|
||
The test dataset we used is defined in `quartz_solar_forecast/dataset/testset.csv`. | ||
This contains 50 PV sites, which 50 unique timestamps. The data is from 2021. | ||
|
||
The results of the evaluation are as follows | ||
The MAE is 0.1906 kw across all horizons. | ||
|
||
| Horizons | MAE [kw] | MAE [%] | | ||
|----------|---------------| ------- | | ||
| 0 | 0.202 +- 0.03 | 6.2 | | ||
| 1 | 0.211 +- 0.03 | 6.4 | | ||
| 2 | 0.216 +- 0.03 | 6.5 | | ||
| 3 - 4 | 0.211 +- 0.02 |6.3 | | ||
| 5 - 8 | 0.191 +- 0.01 | 6 | | ||
| 9 - 16 | 0.161 +- 0.01 | 5 | | ||
| 17 - 24 | 0.173 +- 0.01 | 5.3 | | ||
| 24 - 48 | 0.201 +- 0.01 | 6.1 | | ||
|
||
|
||
Notes: | ||
- THe MAE in % is the MAE divided by the capacity of the PV site. We acknowledge there are a number of different ways to do this. | ||
- it is slightly surprising that the 0-hour forecast horizon and the 24-48 hour horizon have a similar MAE. | ||
This may be because the model is trained expecting live PV data, but currently in this project we provide no live PV data. | ||
|
||
## Abbreviations | ||
|
||
- NWP: Numerical Weather Predictions | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
""" | ||
Make a random test set | ||
This takes a random subset of times and for various pv ids and makes a test set | ||
There is an option to odmit timestamps that don't exsits in the ICON dataset: | ||
https://huggingface.co/datasets/openclimatefix/dwd-icon-eu/tree/main/data | ||
""" | ||
import os | ||
from typing import Optional | ||
|
||
import numpy as np | ||
import pandas as pd | ||
|
||
from quartz_solar_forecast.eval.utils import make_hf_filename | ||
from huggingface_hub import HfFileSystem | ||
|
||
test_start_date = pd.Timestamp("2021-01-01") | ||
test_end_date = pd.Timestamp("2022-01-01") | ||
|
||
# this have been chosen from the entire training set. This ideas | ||
pv_ids = [ | ||
9531, | ||
7174, | ||
6872, | ||
7386, | ||
13607, | ||
6330, | ||
26841, | ||
6665, | ||
4045, | ||
26846, | ||
6494, | ||
7834, | ||
3543, | ||
7093, | ||
3864, | ||
8412, | ||
3454, | ||
9765, | ||
10585, | ||
26942, | ||
7721, | ||
26804, | ||
7551, | ||
26861, | ||
7568, | ||
7338, | ||
7410, | ||
6967, | ||
16480, | ||
7241, | ||
7593, | ||
7557, | ||
7757, | ||
3094, | ||
6800, | ||
26905, | ||
5512, | ||
26840, | ||
7595, | ||
5803, | ||
26876, | ||
7846, | ||
26786, | ||
7580, | ||
6629, | ||
16477, | ||
3489, | ||
26796, | ||
12761, | ||
26903, | ||
] | ||
|
||
np.random.seed(42) | ||
|
||
|
||
def make_test_set(output_file_name: Optional[str] = None, number_of_samples_per_system: int = 50, check_hf_files: bool = False): | ||
""" | ||
Make a test set of random times and pv ids | ||
:param output_file_name: the name of the file to write the test set to | ||
:param number_of_samples_per_system: the number of samples to take per pv id | ||
""" | ||
|
||
if output_file_name is None: | ||
# get the folder where this file is | ||
output_file_name = os.path.dirname(os.path.abspath(__file__)) + "/testset.csv" | ||
|
||
ts = pd.date_range(start=test_start_date, end=test_end_date, freq="15min") | ||
|
||
# check that the files are in HF for ICON | ||
if check_hf_files: | ||
ts = filter_timestamps_if_hf_files_exists(ts) | ||
|
||
test_set = [] | ||
for pv_id in pv_ids: | ||
ts_choice = ts[np.random.choice(len(ts), size=number_of_samples_per_system, replace=False)] | ||
test_set.append(pd.DataFrame({"pv_id": pv_id, "timestamp": ts_choice})) | ||
test_set = pd.concat(test_set) | ||
test_set.to_csv(output_file_name, index=False) | ||
|
||
return test_set | ||
|
||
|
||
def filter_timestamps_if_hf_files_exists(timestamps_full: pd.DatetimeIndex): | ||
""" | ||
Filter the timestamps if the huggingface files exist | ||
We are checking if the teimstamps, rounded down to the nearest 6 hours, | ||
exist in | ||
https://huggingface.co/datasets/openclimatefix/dwd-icon-eu/tree/main/data | ||
""" | ||
timestamps = [] | ||
fs = HfFileSystem() | ||
# print(fs.ls("datasets/openclimatefix/dwd-icon-eu/data/2022/4/11/", detail=False)) | ||
for timestamp in timestamps_full: | ||
timestamp_floor = timestamp.floor("6H") | ||
_, huggingface_file = make_hf_filename(timestamp_floor) | ||
huggingface_file = huggingface_file[14:] | ||
|
||
if fs.exists(huggingface_file): | ||
timestamps.append(timestamp) | ||
else: | ||
print(f"Skipping {timestamp} because {huggingface_file} does not exist") | ||
|
||
timestamps = pd.DatetimeIndex(timestamps) | ||
return timestamps | ||
|
||
|
||
# To run the script, un comment the following line and run this file | ||
# make_test_set(check_hf_files=True) |
Oops, something went wrong.