Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic Evaluation #23

Merged
merged 52 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
a3740ee
add testset, script to make + test
peterdudfield Dec 12, 2023
15f3bcc
format
peterdudfield Dec 12, 2023
434d29a
Scope out forecast evaluation method
zakwatts Dec 12, 2023
0a401e4
Update evaluation.py
zakwatts Dec 12, 2023
1d5052c
push, first try on getting nwp data
peterdudfield Dec 12, 2023
8f134e8
add test nwp
peterdudfield Dec 12, 2023
2579f43
add function to run forecast
peterdudfield Dec 12, 2023
b8e3b79
add combine method
peterdudfield Dec 12, 2023
7b64c0c
add simple metrics
peterdudfield Dec 12, 2023
d5bac61
refactor into metrics file
peterdudfield Dec 12, 2023
9310070
add pv get metadata
peterdudfield Dec 12, 2023
81f16bb
update + add test for evaluation
peterdudfield Dec 12, 2023
0ff8904
Update pv.py
zakwatts Dec 12, 2023
6a9e7f7
get test_eval working
peterdudfield Dec 12, 2023
53c2e1c
update
peterdudfield Dec 12, 2023
d9f1adb
fix tests
peterdudfield Dec 12, 2023
674e69f
fix test_eval_forecast
peterdudfield Dec 12, 2023
ebbf01d
rename cache nwp file
peterdudfield Dec 12, 2023
fab5b0c
refactor
peterdudfield Dec 12, 2023
3bdfb29
add test for getting pv metadata
peterdudfield Dec 12, 2023
38daaad
fix eval test
peterdudfield Dec 12, 2023
4c8c286
update eval and hf logging in
peterdudfield Dec 12, 2023
a66e039
pass HF_TOKEN to tests
peterdudfield Dec 12, 2023
7ffd76a
add HF_TOKEN as extra command
peterdudfield Dec 12, 2023
769eb47
tyr again with CI
peterdudfield Dec 12, 2023
3a1d198
try in env
peterdudfield Dec 12, 2023
a4415b2
make .env in tests
peterdudfield Dec 12, 2023
4458954
use vars
peterdudfield Dec 12, 2023
3a2e562
add python-dotenv to requirements
peterdudfield Dec 12, 2023
33b9641
change unit convertion
peterdudfield Dec 13, 2023
77b2921
add multie proessing to collecting nwp data
peterdudfield Dec 14, 2023
db587bc
add mp spawn=True
peterdudfield Dec 14, 2023
fc231d3
Tidy up print statment and add script, also add to readme.md
peterdudfield Dec 14, 2023
2414dc7
add 95% CI on mean in metrics
peterdudfield Dec 14, 2023
52ca346
fix metrics
peterdudfield Dec 14, 2023
695c292
make test dataset from 50 known good pvs
peterdudfield Dec 15, 2023
3ebc809
change to timestamp
peterdudfield Dec 15, 2023
56a0819
ix test
peterdudfield Dec 15, 2023
0a5d59c
Update testset.csv
peterdudfield Dec 18, 2023
fa8959b
Update testset.csv
peterdudfield Dec 18, 2023
5689bd3
add print statements
peterdudfield Dec 18, 2023
a5fa6ab
update testset
peterdudfield Dec 18, 2023
2fd9ab5
add some print statments
peterdudfield Dec 19, 2023
8c74d41
add results in readme.md
peterdudfield Dec 19, 2023
3edd3bc
add normalized by capacity metrics
peterdudfield Dec 19, 2023
6b52128
fix test
peterdudfield Dec 19, 2023
197f506
PR comment
peterdudfield Dec 19, 2023
09c1662
PR comment
peterdudfield Dec 19, 2023
e1ac788
add notes about evaluation
peterdudfield Dec 19, 2023
44dd8c2
Add PR comment
peterdudfield Dec 19, 2023
039fb65
typos in script comments
peterdudfield Dec 19, 2023
b1013f4
add comment about v1 in forecasts
peterdudfield Dec 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/pytest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ jobs:
# pytest-cov looks at this folder
pytest_cov_dir: "quartz_solar_forecast"
os_list: '["ubuntu-latest"]'
python-version: "['3.10','3.11']"
python-version: "['3.10','3.11']"
extra_commands: echo "HF_TOKEN=${{ vars.HF_TOKEN }}" > .env
37 changes: 35 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,17 @@
<!-- ALL-CONTRIBUTORS-BADGE:END -->

The aim of the project is to build an open source PV forecast that is free and easy to use.
The forecast provides the expected generation in `kw` for 0 to 48 hours for a single PV site.

Open Climate Fix also provide a commercial PV forecast, please get in touch at [email protected]

The current model uses GFS or ICON NWPs to predict the solar generation at a site


```python
from quartz_solar_forecast.forecast import run_forecast
from quartz_solar_forecast.pydantic_models import PVSite

# make input data
# make a pv site object
site = PVSite(latitude=51.75, longitude=-1.25, capacity_kwp=1.25)

# run model, uses ICON NWP data by default
Expand Down Expand Up @@ -50,6 +51,38 @@ The 9 NWP variables, from Open-Meteo documentation, are mentioned above with the
- The model is trained on [UK MetOffice](https://www.metoffice.gov.uk/services/data/met-office-weather-datahub) NWPs, but when running inference we use [GFS](https://www.ncei.noaa.gov/products/weather-climate-models/global-forecast) data from [Open-meteo](https://open-meteo.com/). The differences between GFS and UK MetOffice, could led to some odd behaviours.
- It looks like the GFS data on Open-Meteo is only available for free for the last 3 months.

## Evaluation

To evaluate the model we use the [UK PV](https://huggingface.co/datasets/openclimatefix/uk_pv) dataset and the [ICON NWP](https://huggingface.co/datasets/openclimatefix/dwd-icon-eu) dataset.
All the data is publicly available and the evaluation script can be run with the following command

```bash
python scripts/run_evaluation.py
```

The test dataset we used is defined in `quartz_solar_forecast/dataset/testset.csv`.
This contains 50 PV sites, which 50 unique timestamps. The data is from 2021.

The results of the evaluation are as follows
The MAE is 0.1906 kw across all horizons.

| Horizons | MAE [kw] | MAE [%] |
|----------|---------------| ------- |
| 0 | 0.202 +- 0.03 | 6.2 |
| 1 | 0.211 +- 0.03 | 6.4 |
| 2 | 0.216 +- 0.03 | 6.5 |
| 3 - 4 | 0.211 +- 0.02 |6.3 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very odd the 0th hour horizon is the same as the 24-48 hour horizon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it is a bit. I suspect this might be due to the model being trained expecting live pv data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ive added this to the Readme.md under the evaluation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that could explain it. Is the current model trained without any pv_dropout at all?

| 5 - 8 | 0.191 +- 0.01 | 6 |
| 9 - 16 | 0.161 +- 0.01 | 5 |
| 17 - 24 | 0.173 +- 0.01 | 5.3 |
| 24 - 48 | 0.201 +- 0.01 | 6.1 |


Notes:
- THe MAE in % is the MAE divided by the capacity of the PV site. We acknowledge there are a number of different ways to do this.
- it is slightly surprising that the 0-hour forecast horizon and the 24-48 hour horizon have a similar MAE.
This may be because the model is trained expecting live PV data, but currently in this project we provide no live PV data.

## Abbreviations

- NWP: Numerical Weather Predictions
Expand Down
9 changes: 7 additions & 2 deletions quartz_solar_forecast/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,12 @@ def get_nwp(site: PVSite, ts: datetime, nwp_source: str = "icon") -> xr.Dataset:
}
)
df = df.set_index("time")
data_xr = format_nwp_data(df, nwp_source, site)

return data_xr


def format_nwp_data(df: pd.DataFrame, nwp_source:str, site: PVSite):
data_xr = xr.DataArray(
data=df.values,
dims=["step", "variable"],
Expand All @@ -103,11 +109,10 @@ def get_nwp(site: PVSite, ts: datetime, nwp_source: str = "icon") -> xr.Dataset:
data_xr = data_xr.assign_coords(
{"x": [site.longitude], "y": [site.latitude], "time": [df.index[0]]}
)

return data_xr


def make_pv_data(site: PVSite, ts) -> xr.Dataset:
def make_pv_data(site: PVSite, ts: pd.Timestamp) -> xr.Dataset:
"""
Make fake PV data for the site

Expand Down
133 changes: 133 additions & 0 deletions quartz_solar_forecast/dataset/make_test_set.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
"""
Make a random test set

This takes a random subset of times and for various pv ids and makes a test set

There is an option to odmit timestamps that don't exsits in the ICON dataset:
https://huggingface.co/datasets/openclimatefix/dwd-icon-eu/tree/main/data
"""
import os
from typing import Optional

import numpy as np
import pandas as pd

from quartz_solar_forecast.eval.utils import make_hf_filename
from huggingface_hub import HfFileSystem

test_start_date = pd.Timestamp("2021-01-01")
test_end_date = pd.Timestamp("2022-01-01")

# this have been chosen from the entire training set. This ideas
pv_ids = [
9531,
7174,
6872,
7386,
13607,
6330,
26841,
6665,
4045,
26846,
6494,
7834,
3543,
7093,
3864,
8412,
3454,
9765,
10585,
26942,
7721,
26804,
7551,
26861,
7568,
7338,
7410,
6967,
16480,
7241,
7593,
7557,
7757,
3094,
6800,
26905,
5512,
26840,
7595,
5803,
26876,
7846,
26786,
7580,
6629,
16477,
3489,
26796,
12761,
26903,
]

np.random.seed(42)


def make_test_set(output_file_name: Optional[str] = None, number_of_samples_per_system: int = 50, check_hf_files: bool = False):
"""
Make a test set of random times and pv ids

:param output_file_name: the name of the file to write the test set to
:param number_of_samples_per_system: the number of samples to take per pv id
"""

if output_file_name is None:
# get the folder where this file is
output_file_name = os.path.dirname(os.path.abspath(__file__)) + "/testset.csv"

ts = pd.date_range(start=test_start_date, end=test_end_date, freq="15min")

# check that the files are in HF for ICON
if check_hf_files:
ts = filter_timestamps_if_hf_files_exists(ts)

test_set = []
for pv_id in pv_ids:
ts_choice = ts[np.random.choice(len(ts), size=number_of_samples_per_system, replace=False)]
test_set.append(pd.DataFrame({"pv_id": pv_id, "timestamp": ts_choice}))
test_set = pd.concat(test_set)
test_set.to_csv(output_file_name, index=False)

return test_set


def filter_timestamps_if_hf_files_exists(timestamps_full: pd.DatetimeIndex):
"""
Filter the timestamps if the huggingface files exist

We are checking if the teimstamps, rounded down to the nearest 6 hours,
exist in
https://huggingface.co/datasets/openclimatefix/dwd-icon-eu/tree/main/data

"""
timestamps = []
fs = HfFileSystem()
# print(fs.ls("datasets/openclimatefix/dwd-icon-eu/data/2022/4/11/", detail=False))
for timestamp in timestamps_full:
timestamp_floor = timestamp.floor("6H")
_, huggingface_file = make_hf_filename(timestamp_floor)
huggingface_file = huggingface_file[14:]

if fs.exists(huggingface_file):
timestamps.append(timestamp)
else:
print(f"Skipping {timestamp} because {huggingface_file} does not exist")

timestamps = pd.DatetimeIndex(timestamps)
return timestamps


# To run the script, un comment the following line and run this file
# make_test_set(check_hf_files=True)
Loading
Loading