Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use of PV system data from PV Sites database #226

Merged
merged 30 commits into from
Oct 24, 2023
Merged

Conversation

dfulu
Copy link
Member

@dfulu dfulu commented Sep 21, 2023

Pull Request

Description

This pull request is intended to allow us to use PV data from the pvsites database.

  • When loading training data, the capacity of the PV system was previously taken to be the maximum observed output in the timeseries. When loading from the database, the capacity was set to the metadata value. The maximum observed and the metadata value can be quite different, and if we normalise the PV system data by them this will cause us trouble in production.

    This pull request adds new variables to the PV system DataArray. Previously we only had capacity_watt_power. Now we have observed_capacity_watt_power and metadata_capacity_watt_power instead of this. This is more explicit and should help us avoid normalisation mistakes.

    • However, I'm wondering if it might be better to align the names more closely to the GSP capacity names - which are nominal_capacity_mwp and effective_capacity_mwp?
  • Added OpenPVFromPVSitesDBIterDataPipe datapipe function (plus helper functions) to load from pvsites database. The current OpenPVFromPVDBIterDataPipe datapipe function can only be used to load from the pv database.

    I think we intend to move away from using the pv database, so I wasn't sure if we would want to replace OpenPVFromPVDBIterDataPipe to only load from pvsites. Currently, I have left the two functions alongside each other.

  • Added new tests for OpenPVFromPVSitesDBIterDataPipe

  • Added ApplyPVDropoutIterDataPipe to apply dropout specific for PV systems. Independently dropout each system and set a latency for each system.

  • Added tests for ApplyPVDropoutIterDataPipe

  • Add to PVNet datapipe to include PV system inputs

  • Minor cleaning and refactoring

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@dfulu dfulu changed the title PV inputs from database PV system database vs training capacity differences Sep 21, 2023
@codecov
Copy link

codecov bot commented Sep 21, 2023

Codecov Report

Merging #226 (2c39650) into main (c7d8da3) will increase coverage by 0.16%.
Report is 2 commits behind head on main.
The diff coverage is 87.67%.

@@            Coverage Diff             @@
##             main     #226      +/-   ##
==========================================
+ Coverage   79.94%   80.11%   +0.16%     
==========================================
  Files         126      127       +1     
  Lines        5551     5611      +60     
==========================================
+ Hits         4438     4495      +57     
- Misses       1113     1116       +3     
Files Coverage Δ
ocf_datapipes/config/model.py 86.38% <100.00%> (+0.05%) ⬆️
ocf_datapipes/convert/numpy/batch/pv.py 100.00% <ø> (ø)
ocf_datapipes/load/__init__.py 82.35% <100.00%> (+1.10%) ⬆️
ocf_datapipes/load/pv/database.py 97.05% <100.00%> (+0.84%) ⬆️
ocf_datapipes/load/pv/pv.py 97.16% <100.00%> (-0.13%) ⬇️
...tapipes/select/drop_pv_sys_generating_overnight.py 100.00% <100.00%> (ø)
ocf_datapipes/training/common.py 98.08% <ø> (+2.54%) ⬆️
ocf_datapipes/training/example/nwp_pv.py 89.28% <ø> (ø)
ocf_datapipes/training/example/simple_pv.py 97.14% <ø> (ø)
ocf_datapipes/training/metnet_pv_site.py 80.76% <100.00%> (ø)
... and 10 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@dfulu dfulu changed the title PV system database vs training capacity differences PV system database vs training differences Sep 21, 2023
@dfulu dfulu changed the title PV system database vs training differences Allow use of PV system data from PV Sites database Sep 25, 2023
@dfulu dfulu requested a review from jacobbieker October 23, 2023 15:12
@dfulu dfulu marked this pull request as ready for review October 23, 2023 15:22
Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I would probably go with copying the GSP name conventions, just to keep it the same across those.

ocf_datapipes/training/common.py Outdated Show resolved Hide resolved
@dfulu dfulu merged commit b23c4d4 into main Oct 24, 2023
4 checks passed
This was referenced Oct 27, 2023
@dfulu dfulu deleted the pv_inputs_from_database branch December 19, 2023 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants