Skip to content

Commit

Permalink
added the test_set analysis for checking the trends for pv_ids (#69)
Browse files Browse the repository at this point in the history
* added the analysis for each pv_id trends

* added documentation for test_analysis
  • Loading branch information
roshnaeem authored Mar 4, 2024
1 parent e437d0b commit 05e6f20
Show file tree
Hide file tree
Showing 3 changed files with 238 additions and 1 deletion.
Binary file added images/test_analysis_output.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 15 additions & 1 deletion quartz_solar_forecast/dataset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,18 @@ By analysing the metadata, available at [Hugging Face](https://huggingface.co/da
1. Most of the data in the test set has a tilt angle of 30-34 degrees
2. The maximum kwp is 4.0 & the minmum kwp is 2.25 in the test set.

A detailed anaysis of the test set can be found at quartz_solar_forecast/dataset/dataset_analysis/test_set_analysis.ipynb
A detailed anaysis of the test set can be found at quartz_solar_forecast/dataset/dataset_analysis/test_set_analysis.ipynb

### `test_set_analysis_pv_id_vs_month.ipynb`
This file uses `testset.csv`, which consists of data from 50 photovoltaic systems represented by unique `pv_id`. Each `pv_id` has 50 data points collected at times represented by `timestamp`. The dataset was analyzed to observe the distribution trends of data points during different months of the year for each PV ID.
The following scatter plot shows the distribution of data points for each PV ID across the months of the year.

![PV ID vs. Month Distribution](https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/images/test_analysis_output.png?raw=true)

The following observations were made from the plot:

- **Distribution of Data Points**: The plot displays data points for all months across multiple PV IDs. Each dot signifies an instance of electricity generation data recorded from a PV system.

- **Frequency of Data Points**: The color intensity on the scatter plot corresponds to the frequency of data points for each PV ID and month. Lighter shades represent a lower number of data points, whereas darker shades signify a higher frequency. Notably, the months of May, June, July, August, and September are marked by darker shades, indicating a higher frequency of data points compared to the rest of the year.

- **Uniformity Across Months**: Data points are distributed fairly evenly across the months for each PV ID, which implies that data collection is consistent throughout the year without significant lapses.

Large diffs are not rendered by default.

0 comments on commit 05e6f20

Please sign in to comment.