Skip to content

Commit

Permalink
Update Experiment Log (#180)
Browse files Browse the repository at this point in the history
* Update experiment log

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update MAE value

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add plots

* Update MAE value

* Add plot and analysis script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean up script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Run black

* Run black

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Split into separate file

* Add non-meteomatics error

* New analysis script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update india_windnet_v2.md

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
jacobbieker and pre-commit-ci[bot] authored May 8, 2024
1 parent 718989f commit 668a89a
Show file tree
Hide file tree
Showing 3 changed files with 162 additions and 0 deletions.
96 changes: 96 additions & 0 deletions experiments/analysis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
"""
Script to generate a table comparing two run for MAE values for 48 hour 15 minute forecast
"""

import argparse

import matplotlib.pyplot as plt
import numpy as np
import wandb


def main(runs: list[str], run_names: list[str]) -> None:
"""
Compare two runs for MAE values for 48 hour 15 minute forecast
"""
api = wandb.Api()
dfs = []
for run in runs:
run = api.run(f"openclimatefix/india/{run}")

df = run.history()
# Get the columns that are in the format 'MAE_horizon/step_<number>/val`
mae_cols = [col for col in df.columns if "MAE_horizon/step_" in col and "val" in col]
# Sort them
mae_cols.sort()
df = df[mae_cols]
# Get last non-NaN value
# Drop all rows with all NaNs
df = df.dropna(how="all")
# Select the last row
# Get average across entire row, and get the IDX for the one with the smallest values
min_row_mean = np.inf
for idx, (row_idx, row) in enumerate(df.iterrows()):
if row.mean() < min_row_mean:
min_row_mean = row.mean()
min_row_idx = idx
df = df.iloc[min_row_idx]
# Calculate the timedelta for each group
# Get the step from the column name
column_timesteps = [int(col.split("_")[-1].split("/")[0]) * 15 for col in mae_cols]
dfs.append(df)
# Get the timedelta for each group
groupings = [
[0, 0],
[15, 15],
[30, 45],
[45, 60],
[60, 120],
[120, 240],
[240, 360],
[360, 480],
[480, 720],
[720, 1440],
[1440, 2880],
]
header = "| Timestep |"
separator = "| --- |"
for run_name in run_names:
header += f" {run_name} MAE % |"
separator += " --- |"
print(header)
print(separator)
for grouping in groupings:
group_string = f"| {grouping[0]}-{grouping[1]} minutes |"
# Select indicies from column_timesteps that are within the grouping, inclusive
group_idx = [
idx
for idx, timestep in enumerate(column_timesteps)
if timestep >= grouping[0] and timestep <= grouping[1]
]
for df in dfs:
group_string += f" {df.iloc[group_idx].mean()*100.:0.3f} |"
print(group_string)

# Plot the error on per timestep, and grouped timesteps
plt.figure()
for idx, df in enumerate(dfs):
plt.plot(column_timesteps, df, label=run_names[idx])
plt.legend()
plt.xlabel("Timestep (minutes)")
plt.ylabel("MAE %")
plt.title("MAE % for each timestep")
plt.savefig("mae_per_timestep.png")
plt.show()


if __name__ == "__main__":
parser = argparse.ArgumentParser()
"5llq8iw6"
parser.add_argument("--first_run", type=str, default="xdlew7ib")
parser.add_argument("--second_run", type=str, default="v3mja33d")
# Add arguments that is a list of strings
parser.add_argument("--list_of_runs", nargs="+")
parser.add_argument("--run_names", nargs="+")
args = parser.parse_args()
main(args.list_of_runs, args.run_names)
20 changes: 20 additions & 0 deletions experiments/india_pv_wind.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,26 @@ Overall MAE is 4.9% on the validation set, and forecasts look overall good.
## WindNet


### April-29-2024 WindNet v1 Production Model

[WandB Link](https://wandb.ai/openclimatefix/india/runs/5llq8iw6)

Improvements: Larger input size (64x64), 7 hour delay for ECMWF NWP inputs, to match productions.
New, much more efficient encoder for NWP, allowing for more filters and layers, with less parameters.
The 64x64 input size corresponds to 6.4 degrees x 6.4 degrees, which is around 700km x 700km. This allows for the
model to see the wind over the wind generation sites, which seems to be the biggest reason for the improvement in the model.



MAE is 7.6% with real improvements on the production side of things.


There were other experiments with slightly different numbers of filters, model parameters and the like, but generally no
improvements were seen.


## WindNet v1 Results

### Data

We use Wind generation data for India from April 2019-Nov 2022 for training
Expand Down
46 changes: 46 additions & 0 deletions experiments/india_windnet_v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
### WindNet v2 Meteomatics + ECMWF Model

[WandB Linl](https://wandb.ai/openclimatefix/india/runs/v3mja33d)

This newest experiment uses Meteomatics data in addition to ECMWF data. The Meteomatics data is at specific locations corresponding
to the gneeration sites we know about. It is smartly downscaled ECMWF data, down to 15 minutes and at a few height levels we are
interested in, primarily 10m, 100m, and 200m. The Meteomatics data is a semi-reanalysis, with each block of 6 hours being from one forecast run.
For example, in one day, hours 00-06 are from the same, 00 forecast run, and hours 06-12 are from the 06 forecast run. This is important to note
as it is both not a real reanalysis, but we also can't have it exactly match the live data, as any forecast steps beyond 6 hours are thrown away.
This does mean that these results should be taken as a best case or better than best case scenario, as every 6 hour, observations from the future
are incorporated into the Meteomatics input data from the next NWP mode run.

For the purposes of WindNet, Meteomatics data is treated as Sensor data that goes into the future.
The model encodes the sensor information the same way as for the historical PV, Wind, and GSP generation, and has
a simple, single attention head to encode the information. This is then concatenated along with the rest of the data, like in
previous experiments.

This model also has an even larger input size of ECMWF data, 81x81 pixels, corresponding to around 810kmx810km.
![Screenshot_20240430_082855](https://github.com/openclimatefix/PVNet/assets/7170359/6981a088-8664-474b-bfea-c94c777fc119)

MAE is 7.0% on the validation set, showing a slight improvement over the previous model.

Comperison with the production model:

| Timestep | Prod MAE % | No Meteomatics MAE % | Meteomatics MAE % |
| --- | --- | --- | --- |
| 0-0 minutes | 7.586 | 5.920 | 2.475 |
| 15-15 minutes | 8.021 | 5.809 | 2.968 |
| 30-45 minutes | 7.233 | 5.742 | 3.472 |
| 45-60 minutes | 7.187 | 5.698 | 3.804 |
| 60-120 minutes | 7.231 | 5.816 | 4.650 |
| 120-240 minutes | 7.287 | 6.080 | 6.028 |
| 240-360 minutes | 7.319 | 6.375 | 6.738 |
| 360-480 minutes | 7.285 | 6.638 | 6.964 |
| 480-720 minutes | 7.143 | 6.747 | 6.906 |
| 720-1440 minutes | 7.380 | 7.207 | 6.962 |
| 1440-2880 minutes | 7.904 | 7.507 | 7.507 |

![mae_per_timestep](https://github.com/openclimatefix/PVNet/assets/7170359/e3c942e8-65c6-4b95-8c51-f25d43e7a082)




Example plot

![Screenshot_20240430_082937](https://github.com/openclimatefix/PVNet/assets/7170359/88db342e-bf82-414e-8255-5ad4af659fb8)

0 comments on commit 668a89a

Please sign in to comment.