Add in pvanalytics update (#1)

* v0.1.1 Release (pvlib#132) * change pypi classifier from pre-alpha to beta * remove unnecessary docs/requirements.txt * whatsnew v0.1.1 * include 0.1.1 in whatsnew index * link zenodo in readme * Added clipping time series example for Sphinx documentation. * added sphinx documentation + examples for running the clipping mask. * fixed pep8 formatting errors. * added a new whatsnew rst file for version 0.1.2 * removed close plot to visualize in sphinx. * removed trailing whitespace. * added tight layout for plot sizing. * Updated the docs based on @kanderso-nrel's recs. * fixed pep8 warning. * removed trailing whitespace-pep8 issue. * Added placeholder scripts for each function for Sphinx documentation. * added sphinx module for completeness score. * Update docs/examples/clipping.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/clipping.py Co-authored-by: Cliff Hansen <[email protected]> * added each of the python scripts for detecting stale data, interpolated data, and check for daily data completeness. * updated the interpolated-periods documentation * cleaning up doc strings. * updated the naming conventions of the sphinx docs. * fixed pep8 error on stale data periods docs. * made updates to the Sphinx docs based on kanderso-nrel's feedback. * fixed pep8 errors. * Edited some of the language in the sphinx doc comments. * Added pv-terms json. * added the documentation files for hampel, tukey, and zscore outlier detection. * update the documentation to include separate data files for each of the different issues, to avoid further confusion. * updated the interpolated data docs to pull the correct csv. * More docstring cleanup. * Updated outlier code to use the new outlier csv file. * updated the outliers routine to handle varying indices. * Docstring cleaning. * made updates to the hovertext and the _round edits per @kanderso-nrel's comments * updated diff to round on docstring per @kanderso-nrel's comment. * updated the whatsnew doc with the outliers documentation. * updated the hovertext info. * updated the routine with the bug fix for whatsnew, and removed the initial graphing. * added prints to visualize the imported data in example docs. * Update docs/whatsnew/0.1.2.rst Co-authored-by: Kevin Anderson <[email protected]> * added new commenting based on @cwhanse's recommendations. * fixed improper spelling in comments * Day night masking sphinx documentation (pvlib#139) * update the day-night masking example. * update the day-night masking routine. * added the SERF east data for running the day-night mask examples. * added the day-night masking routine. * Added section for comparing day-night mask to PVlib sunrise-sunset times. * added separate printouts for sunrise and sunset time comparisons. * added vertical lines for sunrise + sunset in plots * update the routine to remove hardcoded file name. * added update to the whatsnew file. * removed a newline to see if we could get git actions to work. * Made updates to documentation based on @kanderso-nrel's recommendations. * Update docs/examples/day-night-masking.py Co-authored-by: Cliff Hansen <[email protected]> * Removed default kwargs for pvlib SPA sunrise-sunset function. * Updating the commenting. * fixed pep8 line length Co-authored-by: Perry <[email protected]> Co-authored-by: Cliff Hansen <[email protected]> * Irradiance sphinx documentation (pvlib#140) * added initial files for all of the irradiance documentation (need to edit). * added RMIS example data for irradiance Sphinx documentation. * added the new qcrad function. * update the examples for both qcrad functions. * added qcrad-limits documentation. * ensured outputs for all irradiance functions in examples. * added plotting functionality for some of the examples. * added graphics for all of the irradiance documentation. * added new line at end of file to stop pep8 failure. * Clean up of doc strings for irradiance documentation. * Fixed the docstring PEP8 error. * Update docs/examples/clearsky-limits-irradiance.py Co-authored-by: Kevin Anderson <[email protected]> * Update docs/examples/clearsky-limits-irradiance.py Co-authored-by: Kevin Anderson <[email protected]> * Update docs/examples/clearsky-limits-irradiance.py Co-authored-by: Kevin Anderson <[email protected]> * Update docs/examples/clearsky-limits-irradiance.py Co-authored-by: Kevin Anderson <[email protected]> * Update docs/examples/qcrad-limits-irradiance.py Co-authored-by: Kevin Anderson <[email protected]> * Update docs/examples/qcrad-limits-irradiance.py Co-authored-by: Kevin Anderson <[email protected]> * Update docs/examples/qcrad-consistency-irradiance.py Co-authored-by: Kevin Anderson <[email protected]> * Update docs/examples/qcrad-consistency-irradiance.py Co-authored-by: Kevin Anderson <[email protected]> * Removed 'sampled' reference from docstring when describing data * changed py:func to py:meth in docstring * Updated the routine to calculate extraterrestrial radition as dni_extra for check_irradiance_limits_qcrad() function. * Renamed the routine Clearsky Limits for Daily Insolation * removed pep8 issues * added the documentation to the whatsnew file. * Update docs/examples/clearsky-limits-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/clearsky-limits-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/qcrad-consistency-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/qcrad-consistency-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/qcrad-consistency-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/qcrad-limits-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/daily-insolation-limits-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/clearsky-limits-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * added day-night mask to clearsky-limits-irradiance documentation * removed hardcoded path! * Update docs/examples/daily-insolation-limits-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/daily-insolation-limits-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/daily-insolation-limits-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * switched the ordering of parameters in ) per @cwhanse's request. * rearranged the order of inputs for irradiance_consistency_qcrad function in unit test. * Update docs/examples/qcrad-consistency-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * updated clearsky-limits-irradiance example to comment on Ineichen model performance * Update docs/examples/qcrad-limits-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/qcrad-consistency-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> * Update docs/examples/daily-insolation-limits-irradiance.py Co-authored-by: Cliff Hansen <[email protected]> Co-authored-by: Perry <[email protected]> Co-authored-by: Kevin Anderson <[email protected]> Co-authored-by: Cliff Hansen <[email protected]> Co-authored-by: Kevin Anderson <[email protected]> Co-authored-by: Perry <[email protected]> Co-authored-by: Cliff Hansen <[email protected]>
kperrynrel · May 25, 2022 · 10d289f · 10d289f
1 parent 334b5d5
commit 10d289f
Show file tree

Hide file tree

Showing 29 changed files with 15,033 additions and 14 deletions.
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -8,7 +8,6 @@ formats: all
 python:
   version: 3.7
   install:
-    - requirements: docs/requirements.txt
     - method: pip
       path: .
       extra_requirements:

diff --git a/README.md b/README.md
@@ -1,5 +1,7 @@
 ![lint and test](https://github.com/pvlib/pvanalytics/workflows/lint%20and%20test/badge.svg)
 [![Coverage Status](https://coveralls.io/repos/github/pvlib/pvanalytics/badge.svg?branch=master)](https://coveralls.io/github/pvlib/pvanalytics?branch=master)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6110569.svg)](https://doi.org/10.5281/zenodo.6110569)
+
 
 # PVAnalytics
 

diff --git a/docs/examples/clearsky-limits-irradiance.py b/docs/examples/clearsky-limits-irradiance.py
@@ -0,0 +1,84 @@
+"""
+Clearsky Limits for Irradiance Data
+===================================
+
+Checking the clearsky limits of irradiance data.
+"""
+
+# %%
+# Identifying and filtering out invalid irradiance data is a
+# useful way to reduce noise during analysis. In this example,
+# we use :py:func:`pvanalytics.quality.irradiance.clearsky_limits`
+# to identify irradiance values that do not exceed
+# a limit based on a clear-sky model. For this example we will
+# use GHI data from the RMIS weather system located on the NREL campus in CO.
+
+import pvanalytics
+from pvanalytics.quality.irradiance import clearsky_limits
+from pvanalytics.features.daytime import power_or_irradiance
+import pvlib
+import matplotlib.pyplot as plt
+import pandas as pd
+import pathlib
+
+# %%
+# First, read in data from the RMIS NREL system. This data set contains
+# 5-minute right-aligned POA, GHI, DNI, DHI, and GNI measurements,
+# but only the GHI is relevant here.
+
+pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
+rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
+data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
+freq = '5T'
+# Make the datetime index tz-aware.
+data.index = data.index.tz_localize("Etc/GMT+7")
+
+
+# %%
+# Now model clear-sky irradiance for the location and times of the
+# measured data. You can do this using
+# :py:meth:`pvlib.location.Location.get_clearsky`, using the lat-long
+# coordinates associated the RMIS NREL system.
+
+location = pvlib.location.Location(39.7407, -105.1686)
+clearsky = location.get_clearsky(data.index)
+
+# %%
+# Use :py:func:`pvanalytics.quality.irradiance.clearsky_limits`.
+# Here, we check GHI data in field 'irradiance_ghi__7981'.
+# :py:func:`pvanalytics.quality.irradiance.clearsky_limits`
+# returns a mask that identifies data that falls between
+# lower and upper limits. The defaults (used here)
+# are upper bound of 110% of clear-sky GHI, and
+# no lower bound.
+
+clearsky_limit_mask = clearsky_limits(data['irradiance_ghi__7981'],
+                                      clearsky['ghi'])
+
+
+# %%
+# Mask nighttime values in the GHI time series using the
+# :py:func:`pvanalytics.features.daytime.power_or_irradiance` function.
+# We will then remove nighttime values from the GHI time series.
+
+day_night_mask = power_or_irradiance(series=data['irradiance_ghi__7981'],
+                                     freq=freq)
+
+# %%
+# Plot the 'irradiance_ghi__7981' data stream and its associated clearsky GHI
+# data stream. Mask the GHI time series by its clearsky_limit_mask for daytime
+# periods.
+# Please note that a simple Ineichen model with static monthly turbidities
+# isn't always accurate, as in this case. Other models that may provide better
+# clear-sky estimates include McClear or PSM3.
+data['irradiance_ghi__7981'].plot()
+clearsky['ghi'].plot()
+data.loc[clearsky_limit_mask & day_night_mask][
+    'irradiance_ghi__7981'].plot(ls='', marker='.')
+plt.legend(labels=["RMIS GHI", "Clearsky GHI",
+                   "Under Clearsky Limit"],
+           loc="upper left")
+plt.xlabel("Date")
+plt.ylabel("GHI (W/m^2)")
+plt.tight_layout()
+plt.show()
diff --git a/docs/examples/clipping.py b/docs/examples/clipping.py
@@ -0,0 +1,71 @@
+"""
+Clipping Detection
+===================
+
+Identifying clipping periods using the PVAnalytics clipping module.
+"""
+
+# %%
+# Identifying and removing clipping periods from AC power time series
+# data aids in generating more accurate degradation analysis results,
+# as using clipped data can lead to under-predicting degradation. In this
+# example, we show how to use
+# :py:func:`pvanalytics.features.clipping.geometric`
+# to mask clipping periods in an AC power time series. We use a
+# normalized time series example provided by the PV Fleets Initiative,
+# where clipping periods are labeled as True, and non-clipping periods are
+# labeled as False. This example is adapted from the DuraMAT DataHub
+# clipping data set:
+# https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data
+
+import pvanalytics
+from pvanalytics.features.clipping import geometric
+import matplotlib.pyplot as plt
+import pandas as pd
+import pathlib
+import numpy as np
+
+# %%
+# First, read in the ac_power_inv_7539 example, and visualize a subset of the
+# clipping periods via the "label" mask column.
+
+pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
+ac_power_file_1 = pvanalytics_dir / 'data' / 'ac_power_inv_7539.csv'
+data = pd.read_csv(ac_power_file_1, index_col=0, parse_dates=True)
+data['label'] = data['label'].astype(bool)
+# This is the known frequency of the time series. You may need to infer
+# the frequency or set the frequency with your AC power time series.
+freq = "15T"
+
+data['value_normalized'].plot()
+data.loc[data['label'], 'value_normalized'].plot(ls='', marker='o')
+plt.legend(labels=["AC Power", "Labeled Clipping"],
+           title="Clipped")
+plt.xticks(rotation=20)
+plt.xlabel("Date")
+plt.ylabel("Normalized AC Power")
+plt.tight_layout()
+plt.show()
+
+# %%
+# Now, use :py:func:`pvanalytics.features.clipping.geometric` to identify
+# clipping periods in the time series. Re-plot the data subset with this mask.
+predicted_clipping_mask = geometric(ac_power=data['value_normalized'],
+                                    freq=freq)
+data['value_normalized'].plot()
+data.loc[predicted_clipping_mask, 'value_normalized'].plot(ls='', marker='o')
+plt.legend(labels=["AC Power", "Detected Clipping"],
+           title="Clipped")
+plt.xticks(rotation=20)
+plt.xlabel("Date")
+plt.ylabel("Normalized AC Power")
+plt.tight_layout()
+plt.show()
+
+
+# %%
+# Compare the filter results to the ground-truth labeled data side-by-side,
+# and generate an accuracy metric.
+acc = 100 * np.sum(np.equal(data.label,
+                            predicted_clipping_mask))/len(data.label)
+print("Overall model prediction accuracy: " + str(round(acc, 2)) + "%")
diff --git a/docs/examples/daily-insolation-limits-irradiance.py b/docs/examples/daily-insolation-limits-irradiance.py
@@ -0,0 +1,69 @@
+"""
+Clearsky Limits for Daily Insolation
+====================================
+
+Checking the clearsky limits for daily insolation data.
+"""
+
+# %%
+# Identifying and filtering out invalid irradiance data is a
+# useful way to reduce noise during analysis. In this example,
+# we use :py:func:`pvanalytics.quality.irradiance.daily_insolation_limits`
+# to determine when the daily insolation lies between a minimum
+# and a maximum value. Irradiance measurements and clear-sky
+# irradiance on each day are integrated with the trapezoid rule
+# to calculate daily insolation. For this example we will use data
+# from the RMIS weather system located on the NREL campus
+# in Colorado, USA.
+
+import pvanalytics
+from pvanalytics.quality.irradiance import daily_insolation_limits
+import pvlib
+import matplotlib.pyplot as plt
+import pandas as pd
+import pathlib
+
+# %%
+# First, read in data from the RMIS NREL system. This data set contains
+# 5-minute right-aligned data. It includes POA, GHI,
+# DNI, DHI, and GNI measurements.
+
+pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
+rmis_file = pvanalytics_dir / 'data' / 'irradiance_RMIS_NREL.csv'
+data = pd.read_csv(rmis_file, index_col=0, parse_dates=True)
+# Make the datetime index tz-aware.
+data.index = data.index.tz_localize("Etc/GMT+7")
+
+# %%
+# Now model clear-sky irradiance for the location and times of the
+# measured data:
+location = pvlib.location.Location(39.7407, -105.1686)
+clearsky = location.get_clearsky(data.index)
+
+# %%
+# Use :py:func:`pvanalytics.quality.irradiance.daily_insolation_limits`
+# to identify if the daily insolation lies between a minimum
+# and a maximum value. Here, we check GHI irradiance field
+# 'irradiance_ghi__7981'.
+# :py:func:`pvanalytics.quality.irradiance.daily_insolation_limits`
+# returns a mask that identifies data that falls between
+# lower and upper limits. The defaults (used here)
+# are upper bound of 125% of clear-sky daily insolation,
+# and lower bound of 40% of clear-sky daily insolation.
+
+daily_insolation_mask = daily_insolation_limits(data['irradiance_ghi__7981'],
+                                                clearsky['ghi'])
+
+# %%
+# Plot the 'irradiance_ghi__7981' data stream and its associated clearsky GHI
+# data stream. Mask the GHI time series by its daily_insolation_mask.
+data['irradiance_ghi__7981'].plot()
+clearsky['ghi'].plot()
+data.loc[daily_insolation_mask, 'irradiance_ghi__7981'].plot(ls='', marker='.')
+plt.legend(labels=["RMIS GHI", "Clearsky GHI",
+                   "Within Daily Insolation Limit"],
+           loc="upper left")
+plt.xlabel("Date")
+plt.ylabel("GHI (W/m^2)")
+plt.tight_layout()
+plt.show()
diff --git a/docs/examples/data-completeness.py b/docs/examples/data-completeness.py
@@ -0,0 +1,89 @@
+"""
+Missing Data Periods
+====================
+
+Identifying days with missing data using a "completeness" score metric.
+"""
+
+# %%
+# Identifying days with missing data and filtering these days out reduces noise
+# when performing data analysis. This example shows how to use a
+# daily data "completeness" score to identify and filter out days with missing
+# data. This includes using
+# :py:func:`pvanalytics.quality.gaps.completeness_score`,
+# :py:func:`pvanalytics.quality.gaps.complete`, and
+# :py:func:`pvanalytics.quality.gaps.trim_incomplete`.
+
+import pvanalytics
+from pvanalytics.quality import gaps
+import matplotlib.pyplot as plt
+import pandas as pd
+import pathlib
+
+# %%
+# First, we import the AC power data stream that we are going to check for
+# completeness. The time series we download is a normalized AC power time
+# series from the PV Fleets Initiative, and is available via the DuraMAT
+# DataHub:
+# https://datahub.duramat.org/dataset/inverter-clipping-ml-training-set-real-data.
+# This data set has a Pandas DateTime index, with the min-max normalized
+# AC power time series represented in the 'value_normalized' column. The data
+# is sampled at 15-minute intervals. This data set
+# does contain NaN values.
+
+pvanalytics_dir = pathlib.Path(pvanalytics.__file__).parent
+file = pvanalytics_dir / 'data' / 'ac_power_inv_2173.csv'
+data = pd.read_csv(file, index_col=0, parse_dates=True)
+data = data.asfreq("15T")
+
+# %%
+# Now, we use :py:func:`pvanalytics.quality.gaps.completeness_score` to get the
+# percentage of daily data that isn't NaN. This percentage score is calculated
+# as the total number of non-NA values over a 24-hour period, meaning that
+# nighttime values are expected.
+data_completeness_score = gaps.completeness_score(data['value_normalized'])
+
+# Visualize data completeness score as a time series.
+data_completeness_score.plot()
+plt.xlabel("Date")
+plt.ylabel("Daily Completeness Score (Fractional)")
+plt.tight_layout()
+plt.show()
+
+# %%
+# We mask complete days, based on daily completeness score, using
+# :py:func:`pvanalytics.quality.gaps.complete`.
+min_completeness = 0.333
+daily_completeness_mask = gaps.complete(data['value_normalized'],
+                                        minimum_completeness=min_completeness)
+
+# Mask complete days, based on daily completeness score
+data_completeness_score.plot()
+data_completeness_score.loc[daily_completeness_mask].plot(ls='', marker='.')
+data_completeness_score.loc[~daily_completeness_mask].plot(ls='', marker='.')
+plt.axhline(y=min_completeness, color='r', linestyle='--')
+plt.legend(labels=["Completeness Score", "Threshold met",
+                   "Threshold not met", "Completeness Threshold (.33)"],
+           loc="upper left")
+plt.xlabel("Date")
+plt.ylabel("Daily Completeness Score (Fractional)")
+plt.tight_layout()
+plt.show()
+
+# %%
+# We trim the time series based on the completeness score, where the time
+# series must have at least 10 consecutive days of data that meet the
+# completeness threshold. This is done using
+# :py:func:`pvanalytics.quality.gaps.trim_incomplete`.
+number_consecutive_days = 10
+completeness_trim_mask = gaps.trim_incomplete(data['value_normalized'],
+                                              days=number_consecutive_days)
+# Re-visualize the time series with the data masked by the trim mask
+data[completeness_trim_mask]['value_normalized'].plot()
+data[~completeness_trim_mask]['value_normalized'].plot()
+plt.legend(labels=[True, False],
+           title="Daily Data Passing")
+plt.xlabel("Date")
+plt.ylabel("Normalized AC Power")
+plt.tight_layout()
+plt.show()