Benchmark #27

peterdudfield · 2023-12-19T15:11:03Z

Detailed Description

It would be great to bench mark the model

Context

Always good to benchmark

Possible Implementation

a model could use the mean pv value, obviously this model will be bad, but it gives a bit of context to the numbers on the evaluation

felipewhitaker · 2024-02-29T16:08:17Z

Hi! Could I take this one? Is there any deadline? I would aim at doing it in the following weeks.

peterdudfield · 2024-02-29T16:14:37Z

Thanks @felipewhitaker , there is no deadline, So really appreciate you taking this on

ombhojane · 2024-03-11T21:04:35Z

Hello, can anyone please guide to how to perform it in correct way
I'm thinking to perform evaluation using Mean Absolute Error to compare with train and valid data of PV values
Is this is a correct way, it would be great if you explain it once

felipewhitaker · 2024-03-13T18:01:02Z

@ombhojane, it is quite common to use Mean Absolute Error (MAE) for evaluating models, including in the weather research area. Another common metric is Continuous Ranked Probability Score (CRPS), which is a generalization of MAE to take scenarios into consideration (properscoring has an implementation of it).

Independent of the metric, what do you expect to be a correct way? When comparing models, it is important that both are compared by using a dataset that neither have used to learn (test dataset), and that the comparison is fair (it doesn't make much sense to compare two models that predict different things).

felipewhitaker · 2024-03-13T21:39:43Z

After exploring psp, my next step is to use the dataset available in Hugging Face (linked in the first comment of #30) to make an historic average model. What interface should it support? The current model has some attributes (e.g. _config, _nwp_tolerance, _nwp_dropout): should every model include these?

peterdudfield · 2024-03-13T22:25:55Z

I think ideall it would be similar to this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecast.py#L11. Does this answer you question?

peterdudfield · 2024-03-13T22:29:18Z

Or perhaps something like this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecasts/v1.py#L12

It would be good to be able to switch it into the evaulation script easier too, here - https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/eval/forecast.py#L19,

felipewhitaker · 2024-03-13T22:56:46Z

I think ideall it would be similar to this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecast.py#L11. Does this answer you question?

It does help, thanks! I might've missed some details there. Moreover, is there a file containg how the current model was trained (which I believe is in psp)? It would be nice to be able to use the same rough steps.

ombhojane · 2024-03-14T04:08:00Z

After exploring psp, my next step is to use the dataset available in Hugging Face (linked in the first comment of #30) to make an historic average model. What interface should it support? The current model has some attributes (e.g. _config, _nwp_tolerance, _nwp_dropout): should every model include these?

Thanks for referencing a prerequisite and suggestions, makes much clarity.

peterdudfield · 2024-03-14T07:11:16Z

I think ideall it would be similar to this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecast.py#L11. Does this answer you question?

It does help, thanks! I might've missed some details there. Moreover, is there a file containg how the current model was trained (which I believe is in psp)? It would be nice to be able to use the same rough steps.

The running of the model is in here - https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecasts/v1.py. I'm hoping we can make v2, v3, ... e.t.c.
The actual model is in pv-site-prediction but I'm not sure its worth going into that code as it might be a bit dense. The train script is here though

A really simple benchmark could be the prediction is always half the capacity and then run the evaluation. Oviously it would a very bad model, but helps give an impression on what the MAE numbers mean

peterdudfield mentioned this issue Dec 19, 2023

Things to do #1

Closed

21 tasks

peterdudfield added the help wanted Extra attention is needed label Jan 8, 2024

felipewhitaker mentioned this issue Mar 13, 2024

Challenge: new model #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark #27

Benchmark #27

peterdudfield commented Dec 19, 2023 •

edited

Loading

felipewhitaker commented Feb 29, 2024

peterdudfield commented Feb 29, 2024

ombhojane commented Mar 11, 2024

felipewhitaker commented Mar 13, 2024

felipewhitaker commented Mar 13, 2024

peterdudfield commented Mar 13, 2024

peterdudfield commented Mar 13, 2024

felipewhitaker commented Mar 13, 2024

ombhojane commented Mar 14, 2024

peterdudfield commented Mar 14, 2024 •

edited

Loading

Benchmark #27

Benchmark #27

Comments

peterdudfield commented Dec 19, 2023 • edited Loading

Detailed Description

Context

Possible Implementation

felipewhitaker commented Feb 29, 2024

peterdudfield commented Feb 29, 2024

ombhojane commented Mar 11, 2024

felipewhitaker commented Mar 13, 2024

felipewhitaker commented Mar 13, 2024

peterdudfield commented Mar 13, 2024

peterdudfield commented Mar 13, 2024

felipewhitaker commented Mar 13, 2024

ombhojane commented Mar 14, 2024

peterdudfield commented Mar 14, 2024 • edited Loading

peterdudfield commented Dec 19, 2023 •

edited

Loading

peterdudfield commented Mar 14, 2024 •

edited

Loading