Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge new Dask SDT runner tool #157

Merged
merged 182 commits into from
Jul 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
182 commits
Select commit Hold shift + click to select a range
a181d72
Dask initial commit
BillDin Sep 20, 2023
8ed9add
gcp setup script and steps
BillDin Sep 21, 2023
7d6f436
Updated dask.sh to setup Dask cluster with solar data tools codebase
JulianColab Oct 1, 2023
23dabb9
GCP now can run solar data toole pipelines distributed-ly with Dask, …
BillDin Oct 4, 2023
e9daa83
modified script according to doe review (git repo url to variable)
BillDin Oct 22, 2023
8bc6d15
now workers can pull data individually. Included demo notebook and in…
BillDin Oct 22, 2023
ca34c02
added image
BillDin Oct 22, 2023
2e19b96
Merge branch 'main' into Dask
BillDin Oct 22, 2023
a6bd177
demo notebook using cassandra
BillDin Oct 22, 2023
a88d48d
[Feature] Fargate demo created, with Dockerfile and a draft of readme…
JulianColab Nov 29, 2023
2a49109
[Fix] Draft readme completed
JulianColab Nov 29, 2023
0f53761
initial runner development
Joserecre Nov 30, 2023
6dbb9d3
Refactored runner logic
mxandy Dec 1, 2023
10d434e
finalizing work
BillDin Dec 7, 2023
c519b0c
finalizaing work
BillDin Dec 7, 2023
73b166f
add: remote database datalpug
cclintris Dec 8, 2023
87dbcb4
Define "aggregate_reports" as a dask task to pipe outputs into a file
mxandy Dec 9, 2023
3992cb6
add: update remote dataplug
cclintris Dec 9, 2023
182feca
fix conflicts
cclintris Dec 9, 2023
c0d8fe3
fix: runner notebook format
cclintris Dec 9, 2023
2547be1
add: remove personal file path
cclintris Dec 9, 2023
85272c9
re-added reports aggregation
mxandy Dec 10, 2023
901c254
add: remote database notebook
cclintris Dec 10, 2023
3fe7ec6
add: update runner notebook
cclintris Dec 10, 2023
cba601c
Removed Unused functions and added comments for reports aggregation
mxandy Dec 10, 2023
30bcb25
changed a few things per duncan's request
BillDin Dec 10, 2023
426011d
update remote dataplug
cclintris Dec 11, 2023
11a065b
fix conflict
cclintris Dec 11, 2023
3a07ebf
update runner notebook
cclintris Dec 11, 2023
c31d2ee
[Fix] Finished Fargate documentation; to get EMR demo script
JulianColab Dec 11, 2023
0f0aad7
[Fix] Finished Fargate documentation; to get EMR demo script
JulianColab Dec 11, 2023
d6e62c2
[Fix] Finished both EMR and Fargate documentation
JulianColab Dec 11, 2023
5b698d7
[Fix] Make this PR clean: removed other's work, keep /emr and /fargat…
JulianColab Dec 11, 2023
a667c92
add in-pipeline parallelization work
Joserecre Dec 11, 2023
4523d9e
reflected changes per the final review meeting
BillDin Dec 11, 2023
c8a9310
refactor run_job
cclintris Dec 12, 2023
4ccfae0
Add documentation for in-pipeline parallelization and DAG
mxandy Dec 13, 2023
9725d7d
Revert "add in-pipeline parallelization work"
Joserecre Dec 13, 2023
7e637c4
organize runner for submission
cclintris Dec 14, 2023
afb4718
refactor run_job
cclintris Dec 14, 2023
fe57f2a
handoff commit. mvp and profiling archive.
Joserecre Dec 14, 2023
e472a75
Revert setup.py
Dec 14, 2023
3a7ccbe
Revert setup.py
Dec 14, 2023
4c697a9
Merge pull request #6 from CMU-Fall23-Practicum/multidata
Dec 14, 2023
6881ba8
Merge pull request #7 from CMU-Fall23-Practicum/mvp
Dec 14, 2023
561e00d
Merge pull request #5 from CMU-Fall23-Practicum/Xiao
Dec 14, 2023
f1d0242
Remove spwr-data-for-CMU.ipynb from .gitignore
Dec 14, 2023
971a892
Merge pull request #4 from CMU-Fall23-Practicum/Julian
Dec 14, 2023
7043df4
Merge pull request #1 from CMU-Fall23-Practicum/Ding
Dec 14, 2023
61b5cab
Unified folder structure
Dec 14, 2023
f5c108b
Remove duplicate sdt_runner.py
Dec 14, 2023
9c424b9
Revert changes under notebooks folder
Dec 14, 2023
848eb46
add profiling review presentation
Joserecre Dec 14, 2023
71d65ab
Merge pull request #117 from CMU-Fall23-Practicum/main
Thistleman Jan 11, 2024
5d5cffa
cleaned up dask dirs
pluflou Jan 19, 2024
a26b266
corrected errors in readme and removed dataplug info
pluflou Jan 19, 2024
7ca689f
pvdaq and fargate examples added--fargate not working yet
pluflou Jan 19, 2024
af88711
final local demo for distirbuted dask
pluflou Jan 19, 2024
6a88c9e
added example data and cleand up nb imports
pluflou Jan 26, 2024
bb99fa6
updated yml for conda install of dask requs
pluflou Jan 26, 2024
659d7d8
work in progress on pip req file and fargate setup
pluflou Jan 26, 2024
fe64074
aws fargate basic demo added
pluflou Feb 6, 2024
2e0d96b
added info in READMEs
pluflou Feb 6, 2024
e55513e
Update README.md
pluflou Feb 8, 2024
c5292e4
removed pa num
pluflou Feb 12, 2024
5284398
Merge branch 'main' of https://github.com/slacgismo/solar-data-tools …
pluflou Feb 12, 2024
7c726a5
redshift DB example
pluflou Feb 12, 2024
3654461
made get_data method input a tuple and adjusted examples
pluflou Feb 15, 2024
067cb19
local pvdaq example
pluflou Feb 15, 2024
ef4eceb
azure nb demo
pluflou Feb 21, 2024
725df63
azure nb demo
pluflou Feb 21, 2024
5450dde
Merge pull request #120 from slacgismo/azure-test
pluflou Feb 21, 2024
6887d6d
Update README.md
pluflou Feb 22, 2024
d884f73
Update README.md
pluflou Feb 22, 2024
7e8ef55
info on azure setup in readme
pluflou Feb 22, 2024
e6f7674
added detailed dev plan
pluflou Feb 27, 2024
33e0908
Update README.md
pluflou Feb 27, 2024
6da01f0
Implemented redshift and S3 dataplugs, added example use case in data…
vlianCMU Mar 1, 2024
e544e67
Update: merge Client change
zhanghaoc Mar 15, 2024
0a6173b
Implemented and tested s3 and redshift dataplug with examples, modifi…
vlianCMU Mar 19, 2024
5de1520
Added dataplug requiremtns and corrected the error in example file
vlianCMU Mar 19, 2024
cb4381b
Added documentation for dataplugs
vlianCMU Mar 19, 2024
814b5ec
Update: sdt_dask execute
zhanghaoc Mar 21, 2024
0e62ae9
Update: sdt tool example
zhanghaoc Mar 21, 2024
4e2eb63
Update: file from Nimish
zhanghaoc Mar 22, 2024
8b9aa81
Error with DaskTool
nimishy Mar 22, 2024
cfce83d
Merge branch 'develop-dask' into develop-dask-updated
pluflou Mar 22, 2024
ffe2980
Merge pull request #132 from slacgismo/develop-dask-updated
pluflou Mar 22, 2024
29b6fd5
Merge remote-tracking branch 'upstream/develop-dask' into develop-dask
zhanghaoc Mar 23, 2024
24a1426
Style: remove unnecessary import code
zhanghaoc Mar 23, 2024
681848c
Style: remove all sys path append
zhanghaoc Mar 23, 2024
c217c69
Merge branch 'develop-dask' of https://github.com/zhanghaoc/solar-dat…
nimishy Mar 23, 2024
d136935
Handles Scheduler and Client Mismatch, Updated Docker requrements.txt
nimishy Mar 23, 2024
e841c4e
Update sdt dask
zhanghaoc Mar 25, 2024
8fcb909
Added Sphinx style doc for data plug classes; Updated the readme insi…
vlianCMU Mar 26, 2024
45a4cfe
Update: combine error and loss analysis
zhanghaoc Mar 28, 2024
2ab2ed4
Corrected variable naming error and combined three lines into one try…
vlianCMU Mar 28, 2024
e988b44
WIP: corrected the s3bucket dataplug to use session to create s3 client
vlianCMU Mar 28, 2024
34114c7
Client Configurations and Exception handling
nimishy Mar 28, 2024
239bb30
Merge branch 'develop-dask' of https://github.com/zhanghaoc/solar-dat…
nimishy Mar 28, 2024
a792685
Updated data plug README: add not-thread-safe exaplanation
vlianCMU Mar 29, 2024
1403206
windows process is true, revised code
nimishy Mar 29, 2024
8fda18d
Merge branch 'develop-dask' of https://github.com/zhanghaoc/solar-dat…
nimishy Mar 29, 2024
850519a
Merge pull request #135 from slacgismo/main
pluflou Mar 29, 2024
03ce74d
Merge branch 'develop-dask' into develop-dask
pluflou Mar 29, 2024
51e9104
Captured 4 exception and store them as colums into final result
vlianCMU Apr 2, 2024
c2ab1c1
Captured get_data error
vlianCMU Apr 2, 2024
e1421c1
Fix get_error using monkey patch method
zhanghaoc Apr 2, 2024
0e27671
Let get_error exception added error infos for the rest 4 exceptions
vlianCMU Apr 2, 2024
429335b
Update: Simplify delayed structure and so more things in function
zhanghaoc Apr 4, 2024
9082c2c
Dasks baseline Script
nimishy Apr 5, 2024
4f899ec
Baseline Scripts Update
nimishy Apr 8, 2024
8864da7
Check the report from run_pipeline, if less than 1 year, stop running…
vlianCMU Apr 9, 2024
66c9179
Corrected dataplug naming error in example
vlianCMU Apr 9, 2024
52f1c0c
Corrected redshift dataplug
vlianCMU Apr 9, 2024
e04b82c
disabled worker termination on excess memory
nimishy Apr 9, 2024
1e6dc2a
Added new docker image for loss analysis
nimishy Apr 12, 2024
62fe0e7
Implemented Logging and handles multiprocessing logs, and timestamps
nimishy Apr 19, 2024
71ec62d
Update: provide custom output for runner
zhanghaoc Apr 19, 2024
babde26
Update: fix pvdb script bug, add custom columns to get_result
zhanghaoc Apr 20, 2024
563e064
REMOVE Logging from runner
nimishy Apr 22, 2024
a2222d6
updated _clean_data method to use make_time_series instead
vlianCMU Apr 25, 2024
3dc4a7b
Merge generate report stuffs into one delayed method named prepare_fi…
vlianCMU Apr 26, 2024
fcea35d
Update: simplify task graph
zhanghaoc Apr 26, 2024
ce18cba
compressed to two delayed function
vlianCMU Apr 26, 2024
c36548d
Implemented azure client and test script, updated test scripts for al…
vlianCMU Apr 30, 2024
030e6ba
Updated Clients
nimishy Apr 30, 2024
871a4ee
Merge branch 'develop-dask' of https://github.com/zhanghaoc/solar-dat…
nimishy Apr 30, 2024
7352d8b
updated data_plug readme
vlianCMU Apr 30, 2024
4c5aad4
Update: runner README and fargate script
zhanghaoc Apr 30, 2024
a455275
Style: update runner and eliminate deadcode
zhanghaoc Apr 30, 2024
550851b
Added in-line documentation for data plugs
vlianCMU Apr 30, 2024
6c189a6
deleted data_plug examples
vlianCMU Apr 30, 2024
c2eb4b9
updated example data for demo
vlianCMU Apr 30, 2024
581162e
updated Azure demo code
vlianCMU Apr 30, 2024
af0f028
updated azure client and test scripts
vlianCMU Apr 30, 2024
0d204ae
Removed Logging, Env Variable checker
nimishy Apr 30, 2024
2331255
Updated, removed logging
nimishy Apr 30, 2024
a2c40d5
updated azure demo code
vlianCMU Apr 30, 2024
215727a
deleted old demo code
vlianCMU Apr 30, 2024
a6b972a
Update: fargate client notebook example
zhanghaoc Apr 30, 2024
5ca75de
Merge branch 'develop-dask' of https://github.com/zhanghaoc/solar-dat…
zhanghaoc Apr 30, 2024
eebbd80
Updates and Changes
nimishy Apr 30, 2024
bbd31e7
Merge branch 'develop-dask' of https://github.com/zhanghaoc/solar-dat…
nimishy Apr 30, 2024
bfc1df3
Merge branch 'develop-dask' of https://github.com/zhanghaoc/solar-dat…
zhanghaoc May 1, 2024
0affa87
Update: add local example notebook
zhanghaoc May 1, 2024
f6fabe7
Fargate Memory Update
nimishy May 2, 2024
45f97f8
Update: fix bugs for azure notebook
zhanghaoc May 2, 2024
5aa759d
Update: local and fargate notebook
zhanghaoc May 2, 2024
1612791
Corrected naming mismatch nad in-line documentation
vlianCMU May 2, 2024
cd7d819
Renamed tool_demo_S3_fargate to tool_demo_fargate
vlianCMU May 2, 2024
ab3b70b
Client Documentation
nimishy May 2, 2024
62eda5b
deleted unnecessary script
vlianCMU May 2, 2024
589abd9
Merge pull request #142 from zhanghaoc/develop-dask
pluflou May 10, 2024
1f2fe55
save ec2 addition work--unfinished
pluflou May 13, 2024
f35de34
Simplified Runner Tool
nimishy May 17, 2024
b159678
Dask API implementations with Runner Tool
nimishy May 17, 2024
86c7227
Merge pull request #154 from nimishy/develop-dask
pluflou May 21, 2024
dbfda5a
resolve requirements conflicts
pluflou May 21, 2024
d4d2356
Merge branch 'develop-dask' of https://github.com/slacgismo/solar-dat…
pluflou May 21, 2024
e539ba9
Merge branch 'fix-warnings' of https://github.com/slacgismo/solar-dat…
pluflou May 21, 2024
1d2a29c
adjust runner.py and requirements
pluflou May 22, 2024
c72d40c
update fargate example
pluflou May 22, 2024
2d40446
pass kwargs to task graph plotting
pluflou May 24, 2024
8a12eee
Merge branch 'main' of https://github.com/slacgismo/solar-data-tools …
pluflou Jun 11, 2024
4dce729
update docs
pluflou Jun 12, 2024
f79be1e
Fix README.md typos
pluflou Jun 25, 2024
d084bbb
updated docs for dask runner and docker image requs
pluflou Jun 28, 2024
4918108
update docker image and requs
pluflou Jun 28, 2024
6ffd24d
update examples with new runner and docker image
pluflou Jun 28, 2024
4abfee3
Merge branch 'develop-dask' of https://github.com/slacgismo/solar-dat…
pluflou Jun 28, 2024
9b708a9
merging changes
bmeyers Jul 16, 2024
52f6333
Update README.md
pluflou Jul 19, 2024
fa68013
Delete sdt_dask/clients/aws/ec2_cluster_client.py
pluflou Jul 19, 2024
9d23c81
update docker info and fix numpy 2.0 bug
pluflou Jul 19, 2024
e9a0d23
Merge branch 'develop-dask' of https://github.com/slacgismo/solar-dat…
pluflou Jul 19, 2024
12cfcb1
update req file for docker and fargate example
pluflou Jul 19, 2024
fb66062
update env list of docker image
pluflou Jul 19, 2024
239b2f8
add boto3 to conda recipe
pluflou Jul 19, 2024
16fe42e
Delete .idea directory
pluflou Jul 19, 2024
77d942a
clean up docker docs and requirements
pluflou Jul 20, 2024
26c2c67
Merge branch 'develop-dask' of https://github.com/slacgismo/solar-dat…
pluflou Jul 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# Created by https://www.gitignore.io/api/data,macos,python,pycharm,database,sublimetext,jupyternotebook
# Edit at https://www.gitignore.io/?templates=data,macos,python,pycharm,database,sublimetext,jupyternotebook

Expand Down Expand Up @@ -321,3 +320,5 @@ tags
[._]*.un~

# End of https://www.toptal.com/developers/gitignore/api/vim
/sdt_dask/results

7 changes: 0 additions & 7 deletions .idea/solar-data-tools.iml

This file was deleted.

6 changes: 6 additions & 0 deletions conda_recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ requirements:
- qss
- tqdm
- spcqe
- dask
- distributed
- dask-cloudprovider
- graphviz
- bokeh
- boto3

test:
imports:
Expand Down
299 changes: 299 additions & 0 deletions notebooks/examples/redshift_database_example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,299 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "cc12497b-b135-4234-be5f-c9b7d5b34ec7",
"metadata": {
"tags": []
},
"source": [
"# Query SunPower datasets \n",
"\n",
"Note that you need to request an API key by registering at https://pvdb.slacgismo.org and emailing [email protected] with your information and use case."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "65b1d0ac-ad2d-4046-a535-c5ce9c577a1e",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from solardatatools.dataio import load_redshift_data\n",
"from solardatatools.data_handler import DataHandler\n",
"from solardatatools.time_axis_manipulation import make_time_series"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5c6161aa-34d4-4f7e-aa1b-1a023d40ddd6",
"metadata": {},
"outputs": [],
"source": [
"query = {\n",
" 'siteid': 'TABJC1027159', #'TAAI01129193',\n",
" 'api_key': os.environ.get('REDSHIFT_API_KEY'),\n",
" 'sensor': 0\n",
"}\n",
"\n",
"df = load_redshift_data(**query)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ce687cb2-1128-40a4-bd39-a3318d5b4f83",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>site</th>\n",
" <th>meas_name</th>\n",
" <th>ts</th>\n",
" <th>sensor</th>\n",
" <th>meas_val_f</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>TABJC1027159</td>\n",
" <td>ac_power</td>\n",
" <td>2016-03-28 20:40:00</td>\n",
" <td>1913101452_SMA-SB-5000TL-US-22</td>\n",
" <td>1.1394</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>TABJC1027159</td>\n",
" <td>ac_power</td>\n",
" <td>2016-03-28 20:45:00</td>\n",
" <td>1913101452_SMA-SB-5000TL-US-22</td>\n",
" <td>1.4464</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>TABJC1027159</td>\n",
" <td>ac_power</td>\n",
" <td>2016-03-28 20:50:00</td>\n",
" <td>1913101452_SMA-SB-5000TL-US-22</td>\n",
" <td>1.1930</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>TABJC1027159</td>\n",
" <td>ac_power</td>\n",
" <td>2016-03-28 20:55:00</td>\n",
" <td>1913101452_SMA-SB-5000TL-US-22</td>\n",
" <td>2.1952</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>TABJC1027159</td>\n",
" <td>ac_power</td>\n",
" <td>2016-03-28 21:00:00</td>\n",
" <td>1913101452_SMA-SB-5000TL-US-22</td>\n",
" <td>1.4514</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" site meas_name ts \\\n",
"0 TABJC1027159 ac_power 2016-03-28 20:40:00 \n",
"1 TABJC1027159 ac_power 2016-03-28 20:45:00 \n",
"2 TABJC1027159 ac_power 2016-03-28 20:50:00 \n",
"3 TABJC1027159 ac_power 2016-03-28 20:55:00 \n",
"4 TABJC1027159 ac_power 2016-03-28 21:00:00 \n",
"\n",
" sensor meas_val_f \n",
"0 1913101452_SMA-SB-5000TL-US-22 1.1394 \n",
"1 1913101452_SMA-SB-5000TL-US-22 1.4464 \n",
"2 1913101452_SMA-SB-5000TL-US-22 1.1930 \n",
"3 1913101452_SMA-SB-5000TL-US-22 2.1952 \n",
"4 1913101452_SMA-SB-5000TL-US-22 1.4514 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "aaa3ac8d-0cd3-4b55-8e12-7d4d37676fd9",
"metadata": {},
"source": [
"# Create DataHandler"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b3c10b3a-881a-4694-abea-3be877845ae1",
"metadata": {},
"outputs": [],
"source": [
"dh = DataHandler(df, convert_to_ts=True)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "dae9926c-0896-4e06-91e9-0416161e4f9e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total time: 22.85 seconds\n",
"--------------------------------\n",
"Breakdown\n",
"--------------------------------\n",
"Preprocessing 6.68s\n",
"Cleaning 0.35s\n",
"Filtering/Summarizing 15.82s\n",
" Data quality 0.22s\n",
" Clear day detect 0.40s\n",
" Clipping detect 7.37s\n",
" Capacity change detect 7.83s\n",
"\n",
"\n",
"-----------------\n",
"DATA SET REPORT\n",
"-----------------\n",
"length 3.23 years\n",
"capacity estimate 3.79 kW\n",
"data sampling 5 minutes\n",
"quality score 0.96\n",
"clearness score 0.52\n",
"inverter clipping False\n",
"clipped fraction 0.01\n",
"capacity changes True\n",
"data quality warning True\n",
"time shift errors False\n",
"time zone errors False\n",
" \n"
]
}
],
"source": [
"dh.run_pipeline()\n",
"dh.report()"
]
},
{
"cell_type": "markdown",
"id": "bcdae3b1-b1f2-4487-ab1d-c6758caeb6ad",
"metadata": {},
"source": [
"### or manually adjust the conversion to timeseries"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "cdbac2e1-67d4-482b-83e3-588b0036b611",
"metadata": {},
"outputs": [],
"source": [
"df, _ = make_time_series(df)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ea2c2d15-8624-40a9-9cb4-2309ef83f8e6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total time: 23.33 seconds\n",
"--------------------------------\n",
"Breakdown\n",
"--------------------------------\n",
"Preprocessing 6.73s\n",
"Cleaning 0.40s\n",
"Filtering/Summarizing 16.20s\n",
" Data quality 0.23s\n",
" Clear day detect 0.42s\n",
" Clipping detect 6.90s\n",
" Capacity change detect 8.66s\n",
"\n",
"\n",
"-----------------\n",
"DATA SET REPORT\n",
"-----------------\n",
"length 3.23 years\n",
"capacity estimate 3.79 kW\n",
"data sampling 5 minutes\n",
"quality score 0.96\n",
"clearness score 0.52\n",
"inverter clipping False\n",
"clipped fraction 0.01\n",
"capacity changes True\n",
"data quality warning True\n",
"time shift errors False\n",
"time zone errors False\n",
" \n"
]
}
],
"source": [
"dh = DataHandler(df)#, convert_to_ts=True)\n",
"dh.run_pipeline()\n",
"dh.report()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
13 changes: 11 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ dependencies = [
"clarabel",
"qss",
"tqdm",
"spcqe"
"spcqe",
"boto3"
]

classifiers = [
Expand All @@ -55,7 +56,7 @@ dynamic = ["version"]
[tool.setuptools_scm]

[tool.setuptools.packages.find]
include = ["solardatatools*", "pvsystemprofiler*", "statistical_clear_sky*"]
include = ["solardatatools*", "pvsystemprofiler*", "statistical_clear_sky*", "sdt_dask*"]

[project.optional-dependencies]
docs = [
Expand All @@ -66,6 +67,14 @@ docs = [
mosek = [
"mosek"
]
dask = [
"numpy==2.0", # to match provided sdt docker image
"dask==2024.5.2", # to match provided sdt docker image
"distributed==2024.5.2", # to match provided sdt docker image
"dask-cloudprovider[all]==2022.10.0",
"graphviz", # for local task graph visualization
"bokeh" # for local task graph visualization
]

[project.urls]
Homepage = "https://github.com/slacgismo/solar-data-tools"
Expand Down
7 changes: 7 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,10 @@ clarabel
qss
tqdm
spcqe
# Packages below this line are for the SDT Dask tool feature
dask
distributed
dask-cloudprovider[all]
graphviz
bokeh
boto3
Loading
Loading