Added PV Fleets QA pipeline examples #202

kperrynrel · 2024-01-11T17:43:19Z

Closes Power, irradiance, and temperature examples from PV Fleets pipeline? #201
Non-API functions clearly documented with docstrings or comments as necessary.
Adds description and name entries in the appropriate "what's new" file
in docs/whatsnew
for all changes. Includes link to the GitHub Issue with :issue:`num`
or this Pull Request with :pull:`num`. Includes contributor name
and/or GitHub username (link with :ghuser:`user`).
Pull request is nearly complete and ready for detailed review.
Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

This PR includes QA examples of the PV fleets pipeline, adapted from the following repo: https://github.com/kperrynrel/fleets_qa_examples

cwhanse · 2024-01-11T18:52:09Z

@kperrynrel I think you'll need a README.rst file in order for the section of the Examples to be created. I would call this group of examples PVFleets QA pipeline.

kandersolar · 2024-01-22T18:32:43Z

The test failures can be ignored here; they'll be fixed in #203.

kperrynrel · 2024-01-24T20:34:30Z

@kandersolar these are ready for review. Do we have a limit on example data size? The files are bigger for these examples and I could host them on the Duramat datahub if need be. Let me know!

cwhanse

@kperrynrel many of the style-related comments in the irradiance example also apply to the power example.

These are lengthy scripts, thanks for preparing them.

Are there steps that you think could be packaged into new pvanalytics functions? Something to consider for future work.

docs/examples/pvfleets-qa-pipeline/README.rst

docs/examples/pvfleets-qa-pipeline/pvfleets-irradiance-qa.py

cwhanse · 2024-01-24T21:33:05Z

docs/examples/pvfleets-qa-pipeline/pvfleets-irradiance-qa.py

+# Filter the time series, taking out all of the issues
+time_series = time_series[~stale_data_mask]
+time_series = time_series[~negative_mask]
+time_series = time_series[erroneous_mask]
+time_series = time_series[~out_of_bounds_mask]
+time_series = time_series[~zscore_outlier_mask]
+time_series = time_series.asfreq(data_freq)


This sequential filtering makes me uncomfortable. Although the plot shows that time_series maintains its original length, the statements make me think that points are being dropped. I may be reading these lines as arrays and these are OK because pandas handles the indexing.

The asfreq() call at the end of this returns the data to its original frequency, with NaN's for the data that's been filtered out. To make this less sequential, I have updated this to a single call:

# Filter the time series, taking out all of the issues issue_mask = ((~stale_data_mask) & (~negative_mask) & (~erroneous_mask) & (~zscore_outlier_mask)) time_series = time_series[issue_mask] time_series = time_series.asfreq(data_freq)

That's much easier to read, thanks

docs/examples/pvfleets-qa-pipeline/pvfleets-irradiance-qa.py

docs/examples/pvfleets-qa-pipeline/pvfleets-temperature-qa.py

cwhanse · 2024-01-24T21:45:08Z

docs/examples/pvfleets-qa-pipeline/pvfleets-temperature-qa.py

+# post-filtering.
+
+# Filter the time series, taking out all of the issues
+time_series = time_series[~stale_data_mask]


Same comment as in the irradiance example: do these filters change the length of the Series?

Did the same here as above to make the mask a single call. asfreq() returns the time series to original data frequency with NaN's:

# Filter the time series, taking out all of the issues issue_mask = ((~stale_data_mask) & (temperature_limit_mask) & (~zscore_outlier_mask)) time_series = time_series[issue_mask] time_series = time_series.asfreq(data_freq)

docs/examples/pvfleets-qa-pipeline/pvfleets-power-qa.py

Co-authored-by: Cliff Hansen <[email protected]>

kandersolar · 2024-01-26T18:00:06Z

Do we have a limit on example data size? The files are bigger for these examples and I could host them on the Duramat datahub if need be.

Looks like adding these files will increase the size of the data directory from ~2 MB to ~80 MB. It would be nice to keep both the repository and the PyPI distribution files lightweight if we can, and that big of an increase for data files that many/most installations won't ever touch does seem a bit wasteful. Keeping the datasets on the Data Hub instead of this repository comes with the downside of having to make network requests during the docs build, which isn't desirable either.

I explored a few alternative formats for system_50_ac_power_2_full_DST.csv. Of course storing as a binary format like parquet helps, as does cutting off unnecessary digits of precision in the power data, and using an encoding for the timestamp index that exploits the consistent 1-minute interval. Results:

CSV (current form): 50.8 MB
Zipped CSV: 7.9 MB
parquet: 14.8 MB
parquet, float32: 14.2 MB
parquet, float32, delta-encoded index: 4.6 MB
parquet, float32, delta-encoded index, negative values clipped to zero: 3.2 MB

For a dataset with ~1.4 million values (plus as many timestamps), I doubt there's much more opportunity for easy reduction via encoding tweaks. For reference, here is the code to produce the 3.2 MB file:

df = pd.read_csv("system_50_ac_power_2_full_DST.csv", index_col=0, parse_dates=True)
df.astype(np.float32).clip(lower=0).reset_index().to_parquet("system_50_encoded.parquet",
                                                             use_dictionary=False,
                                                             column_encoding={"measured_on": "DELTA_BINARY_PACKED"})

The next question is: do we need the full data file? It's currently ~2.5 years. Cutting it down to one year gets the size down to 1.2 MB. 6 months is only 600 kB. Would shorter data periods still achieve the goals of these examples? Or resample the file to 5-minute averages?

…sampling

kperrynrel · 2024-02-02T23:40:54Z

@kandersolar I upsampled all the data so it's 15 minute averaged, and then converted to parquet format. We are now getting an error on parquet support:

Missing optional dependency 'pyarrow'. pyarrow is required for parquet support. Use pip or conda to install pyarrow.
Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet

You ok with me adding either fastpaquet or pyarrow as a dependency? Also, do you have a preference on one?

kandersolar · 2024-02-07T19:11:33Z

You ok with me adding either fastpaquet or pyarrow as a dependency? Also, do you have a preference on one?

Yep! I think we can get away with keeping it as a docs requirement rather than an installation requirement. I'd say mild preference for pyarrow for its maturity.

kperrynrel · 2024-02-09T21:36:52Z

@kandersolar @cwhanse let me know if there's anything else I can update here before the 2.0 release! I updated some of the matplotlib rendering yesterday and am feeling ok with how it is looking. Thanks!

cwhanse · 2024-02-09T21:53:34Z

docs/examples/pvfleets-qa-pipeline/pvfleets-irradiance-qa.py

+
+
+# %%
+# Display the final irradiance time series, post-QA filtering.


This plot isn't rendering

That plot still isn't rendering, idk why not. @kperrynrel

Looks like it's rendering now, not sure what was going on with it:

cwhanse

Thanks @kperrynrel

docs/examples/pvfleets-qa-pipeline/pvfleets-irradiance-qa.py

Co-authored-by: Cliff Hansen <[email protected]>

kandersolar

Thanks @kperrynrel! One minor comment, otherwise good to merge

kandersolar · 2024-02-12T15:58:27Z

docs/examples/pvfleets-qa-pipeline/pvfleets-irradiance-qa.py

+# 3) "Abnormal" data periods, which are defined as less than 10% of the
+#    daily time series mean OR greater than 1300


This "less than 10% of daily mean" rule is used in the power analysis code, but not here. Should this text say something about the daily minimum being above 50 W/m2?

Good catch at @kandersolar, updated docstring to the following in commit 0cadb6e:

"Abnormal" data periods, which are defined as less than the daily minimum of 50 OR any data greater than 1300

Looks like the docs rendering in the checks for some reason isn't updating, even though the underlying code has been updated. Not sure what's going on with that

Should it be "greater than" instead of "less than"?

I often have to manually refresh the RTD page to see updates after a rebuild. Something to do with browser caching. I'm guessing that's what you're experiencing too.

oops, updated in 5a01c3a

kperrynrel added 3 commits January 11, 2024 10:38

Added the temperature QA example

dd40be1

added temp QA dictionary

ac03286

fixing pep8 errors for temp example

63ea150

kperrynrel added 4 commits January 11, 2024 12:02

added readme.rst at @cwhanse's guidance

2c94bc2

updated the irradiance routine, currently working on power

f38f39a

pep8 compliant formatting

c65689e

refactored the power example so it runs end-to-end

b6de6ab

kperrynrel self-assigned this Jan 11, 2024

kperrynrel added the documentation Improvements or additions to documentation label Jan 11, 2024

kperrynrel added this to the v0.2.0 milestone Jan 11, 2024

kperrynrel added 3 commits January 17, 2024 08:44

added psm3 data

7f000b7

debug psm3/df NaNs

6ccc21c

update documentation error with graphic

ed39d0f

kperrynrel added 5 commits January 24, 2024 12:50

more cleanup of the examples

ead2719

added cutoff line for daily completeness score

cf3cb79

updated the whatsnew file

ff8e263

fixing example errors

3eb3e19

fix completeness score graphics

4083d7b

kperrynrel requested a review from kandersolar January 24, 2024 20:33

cwhanse reviewed Jan 24, 2024

View reviewed changes

kperrynrel and others added 6 commits January 24, 2024 16:27

Update docs/examples/pvfleets-qa-pipeline/pvfleets-power-qa.py

6578747

Co-authored-by: Cliff Hansen <[email protected]>

Update docs/examples/pvfleets-qa-pipeline/README.rst

793956f

Co-authored-by: Cliff Hansen <[email protected]>

Update docs/examples/pvfleets-qa-pipeline/pvfleets-irradiance-qa.py

f16a078

Co-authored-by: Cliff Hansen <[email protected]>

Update docs/examples/pvfleets-qa-pipeline/pvfleets-irradiance-qa.py

3dbc89b

Co-authored-by: Cliff Hansen <[email protected]>

fixed redundant data shift call in scripts

cea4fbb

more cleanup of text

d3c2d56

kperrynrel added 7 commits January 29, 2024 16:01

fixed erroneous data issues

54d7768

cleaned up the mask sequencing

81cf2a2

updated the mask issues, making data issue mask a single mask

f93f6c5

significantly reduced the file sizes by taking smaller snapshots + re…

e209d0e

…sampling

more updates to file sizes

ccf7e13

updated the irradiance stream size

23c810d

added parquet files for gallery testing

8f66349

added pyarrow as doc requirement

850b650

kperrynrel mentioned this pull request Feb 8, 2024

Address various deprecations and test issues with pandas 2.2.0 #203

Merged

8 tasks

kperrynrel and others added 5 commits February 8, 2024 10:48

update the plots so the axes aren't cutoff--fix other issues with T->min

eb8ac00

Merge branch 'main' into pvfleets-examples

93a59d5

fixed pep8 issues

bb800d9

updated the routine to get rid of power error

66d0d21

more standardizing of the output graphs

d12a58f

cwhanse reviewed Feb 9, 2024

View reviewed changes

cwhanse approved these changes Feb 9, 2024

View reviewed changes

docs/examples/pvfleets-qa-pipeline/pvfleets-irradiance-qa.py Outdated Show resolved Hide resolved

Update docs/examples/pvfleets-qa-pipeline/pvfleets-irradiance-qa.py

70fdaa3

Co-authored-by: Cliff Hansen <[email protected]>

kandersolar approved these changes Feb 12, 2024

View reviewed changes

kperrynrel added 3 commits February 12, 2024 10:02

addressing doctstring comments that @kandersolar pointed out

99e8461

merging in pushes

0cadb6e

updates to docstring

5a01c3a

kandersolar merged commit 5e05983 into pvlib:main Feb 12, 2024
42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added PV Fleets QA pipeline examples #202

Added PV Fleets QA pipeline examples #202

kperrynrel commented Jan 11, 2024 •

edited by cwhanse

Loading

cwhanse commented Jan 11, 2024

kandersolar commented Jan 22, 2024

kperrynrel commented Jan 24, 2024

cwhanse left a comment

cwhanse Jan 24, 2024

kperrynrel Jan 29, 2024 •

edited

Loading

cwhanse Feb 8, 2024

cwhanse Jan 24, 2024

kperrynrel Jan 29, 2024

kandersolar commented Jan 26, 2024

kperrynrel commented Feb 2, 2024

kandersolar commented Feb 7, 2024

kperrynrel commented Feb 9, 2024

cwhanse Feb 9, 2024

cwhanse Feb 9, 2024 •

edited

Loading

kperrynrel Feb 12, 2024

cwhanse left a comment

kandersolar left a comment

kandersolar Feb 12, 2024

kperrynrel Feb 12, 2024 •

edited

Loading

kandersolar Feb 12, 2024

kperrynrel Feb 12, 2024



		# %%
		# Display the final irradiance time series, post-QA filtering.

		# 3) "Abnormal" data periods, which are defined as less than 10% of the
		# daily time series mean OR greater than 1300

Added PV Fleets QA pipeline examples #202

Added PV Fleets QA pipeline examples #202

Conversation

kperrynrel commented Jan 11, 2024 • edited by cwhanse Loading

cwhanse commented Jan 11, 2024

kandersolar commented Jan 22, 2024

kperrynrel commented Jan 24, 2024

cwhanse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kperrynrel Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kandersolar commented Jan 26, 2024

kperrynrel commented Feb 2, 2024

kandersolar commented Feb 7, 2024

kperrynrel commented Feb 9, 2024

Choose a reason for hiding this comment

cwhanse Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cwhanse left a comment

Choose a reason for hiding this comment

kandersolar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kperrynrel Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kperrynrel commented Jan 11, 2024 •

edited by cwhanse

Loading

kperrynrel Jan 29, 2024 •

edited

Loading

cwhanse Feb 9, 2024 •

edited

Loading

kperrynrel Feb 12, 2024 •

edited

Loading