Internal docs review1 (R. Ennis). Additional comments: When the docum…

…ent was knit, you used to get a warning indicating the YAML title and the vignette title were different. I changed it them to match and there is no warning. Might be better to leave the title as just the package name since you have a description of what the package does, how to install, etc and not just the Pensacola Bay specific example.
USEPA · Sep 30, 2023 · f5d941a · f5d941a
1 parent 4e4f17a
commit f5d941a
Showing 1 changed file with 95 additions and 75 deletions.
diff --git a/demos/Harmonize_Pensacola.Rmd b/demos/Harmonize_Pensacola.Rmd
@@ -1,86 +1,112 @@
 ---
-title: "R markdown for harmonize-wq Harmonize_Pensacola"
+title: "harmonize-wq in R"
 author: "Justin Bousquin, Cristina Mullin, Marc Weber"
 date: '2022-08-31'
 output: rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Harmonize_Pensacola R Markdown}
+  %\VignetteIndexEntry{harmonize-wq in R}
   %\usepackage[utf8]{inputenc}
   %\VignetteEngine{knitr::rmarkdown}
 editor_options: 
   chunk_output_type: console
 ---
 
-## R Markdown
 ```{r setup, include = FALSE}
+# Set chunk options
 knitr::opts_chunk$set(
   collapse = TRUE,
   comment = "#>"
 )
 ```
 
-Standardize, clean and wrangle Water Quality Portal data in Pensacola and Perdido Bays into more analytic-ready formats using the harmonize_wq package
-US EPA’s Water Quality Portal (WQP) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using python or R. Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:
+<br>
+
+## Overview
+
+Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats using the harmonize_wq package. US EPA’s Water Quality Portal (WQP) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using python or R. Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:
+
+* Identify differences in data units (including speciation and basis)
+* Identify differences in sampling or analytic methods
+* Resolve data errors using transparent assumptions
+* Reduce data to the columns that are most commonly needed
+* Transform data from long to wide format
 
-Identify differences in data units (including speciation and basis)
-Identify differences in sampling or analytic methods
-Resolve data errors using transparent assumptions
-Reduce data to the columns that are most commonly needed
-Transform data from long to wide format
 Domain experts must decide what data meets their quality standards for data comparability and any thresholds for acceptance or rejection.
 
-The first part of this notebook walks through a typical harmonization process on data retrieved from Perdido and Pensacola Bays, FL. The second part of the notebook takes a deeper dive into exactly what is done to each water quality characteristic result and some ways to leverage additional functions in the package for special use cases.
+<br>
+
+<br>
+
+## Installation & Setup
 
-## Set up working environment
+#### Install the harmonize-wq package (Command Line)
 
-Steps: 
-1) If needed, re-install [miniforge](https://github.com/conda-forge/miniforge). Once miniforge is installed. Go to your start menu and open the Miniforge Prompt.
-2) At the Miniforge Prompt:
-  - conda create --name wq_harmonize
-  - activate wq_harmonize
-  - conda install geopandas pip dataretrieval pint
-  - may need to update conda
-    - conda update -n base -c conda-forge conda
-  - pip install harmonize-wq
-  - pip install git+https://github.com/USEPA/harmonize-wq.git (dev version)
-
-ALTERNATIVELY, you may be able to set up your environment and import the required Python packages using the block of R code below:
+To install and set up the harmonize-wq package using the command line:
 
-```{r, results = 'hide', message = FALSE, warning = FALSE}
+1. If needed, re-install [miniforge](https://github.com/conda-forge/miniforge). Once miniforge is installed. Go to your start menu and open the Miniforge Prompt.
+2. At the Miniforge Prompt:
+    - conda create --name wq_harmonize
+    - activate wq_harmonize
+    - conda install geopandas pip dataretrieval pint
+    - may need to update conda
+      - conda update -n base -c conda-forge conda
+    - pip install harmonize-wq
+    - pip install git+https://github.com/USEPA/harmonize-wq.git (dev version)
+
+<br>
+
+#### Install the harmonize-wq package (R)
+
+**Alternatively**, you may be able to set up your environment and import the required Python packages using the block of R code below:
+
+```{r, results = 'hide', eval=FALSE}
+# If needed, install the reticulate package to use Python in R
 install.packages("reticulate")
+library(reticulate)
 
-#envname may need to be the full path, e.g.: "~/AppData/Local/miniforge3/envs/wq_harmonize"
+# The reticulate package will automatically look for an installation of Conda
+# However, you may specify the location if needed using options(reticulate.conda_binary = 'dir')
+options(reticulate.conda_binary = '~/AppData/Local/miniforge3/Scripts/conda.exe')
+
+# Create a new Python environment called "wq-reticulate"
+# Note that the environment name may need to include the full path (e.g. "~/AppData/Local/miniforge3/envs/wq_harmonize")
 conda_create("wq-reticulate")
+
+# Install the following packages to the newly created environment
 conda_install("wq-reticulate", "geopandas")
 conda_install("wq-reticulate", "pint")
 conda_install("wq-reticulate", "dataretrieval")
 
-# Only works with py install (pip), which defaults to virtualenvs,
-#Again, envname may need to be the full path, e.g.: "~/AppData/Local/miniforge3/envs/wq_harmonize"
-py_install("harmonize-wq", pip = TRUE, envname = "C:/Users/cmulli01/AppData/Local/miniforge3/envs/wq_harmonize")
-# Dev version
-#py_install("git+https://github.com/USEPA/harmonize-wq.git", pip = TRUE, envname = "C:/Users/cmulli01/AppData/Local/miniforge3/envs/wq_harmonize")
+# Install the harmonize-wq package
+# This only works with py_install() (pip), which defaults to virtualenvs
+# Note that the environment name may need to include the full path (e.g. "~/AppData/Local/miniforge3/envs/wq_harmonize")
+py_install("harmonize-wq", pip = TRUE, envname = "wq-reticulate")
 
-```
-
-## Specify the environment where the dependencies in the above block were installed, and the load in all the required dependencies
-```{r, results = 'hide', message = FALSE, warning = FALSE}
-library(reticulate)
+# To install the dev version of harmonize-wq from GitHub
+# Note that the environment name may need to include the full path (e.g. "~/AppData/Local/miniforge3/envs/wq_harmonize")
+py_install("git+https://github.com/USEPA/harmonize-wq.git@new_release_0-3-8", pip = TRUE, envname = "wq-reticulate")
 
-# If Conda is installed somewhere else other than where reticulate automatically looked, you can specify it
-options(reticulate.conda_binary ='~/AppData/Local/miniforge3/Scripts/conda.exe')
+# Specify the Python environment to be used
 use_condaenv("wq_harmonize")
 
-# use these to test that your environment is set up correctly
+# Test that your Python environment is correctly set up
+# Both imports should return "Module(package_name)"
 import("harmonize_wq")
 import("dataretrieval")
 ```
 
-## Import the required libraries. Check requirements.txt for dependencies that should be installed.
-```{python}
-# Note that outside of a markdown file, you can run python code w/ reticulate using:
-# reticulate::repl_python()
+<br>
+
+#### Import required libraries
+
+The full list of dependencies that should be installed to use the harmonize-wq package can be found in [`requirements.txt`](https://github.com/USEPA/harmonize-wq/blob/new_release_0-3-8/requirements.txt). **Note that `reticulate::repl_python()` must be called to execute these commands using the reticulate package in R.**
+
+```{r}
+# Use reticulate to execute python commands
+reticulate::repl_python()
+```
 
+```{python}
 # Use these reticulate imports to test the modules are installed
 import harmonize_wq
 import dataretrieval
@@ -94,55 +120,54 @@ from harmonize_wq import wrangle
 from harmonize_wq import clean
 from harmonize_wq import location
 from harmonize_wq import visualize
-
 ```
 
-## Simple example workflow for temperatures
+<br>
+
+<br>
+
+## Usage
 
-dataretrieval Query for a geojson
+The following example illustrates a typical harmonization process using the harmonize-wq package on WQP data retrieved from Perdido and Pensacola Bays, FL.
 
-```{python include=FALSE}
+First, determine an area of interest (AOI), build a query, and retrieve water temperature and Secchi disk depth data from WQP for the AOI using the dataretrieval package:
 
-# File for area of interest
+```{python, message=FALSE, warning=FALSE, error=FALSE}
+# File for area of interest (Pensacola and Perdido Bays, FL)
 aoi_url = r'https://raw.githubusercontent.com/USEPA/harmonize-wq/main/harmonize_wq/tests/data/PPBays_NCCA.geojson'
 
-# Build query and get data with dataretrieval
+# Build query and get WQP data with dataretrieval
 query = {'characteristicName': ['Temperature, water',
                                 'Depth, Secchi disk depth',
                                 ]}
 
-#use harmonize-wq to wrangle
+# Use harmonize-wq to wrangle
 query['bBox'] = wrangle.get_bounding_box(aoi_url)
 query['dataProfile'] = 'narrowResult'
 
 # Run query
 res_narrow, md_narrow = wqp.get_results(**query)
 
-# dataframe of downloaded results
+# DataFrane of downloaded results
 res_narrow
-
 ```
 
-Harmonize and clean all results
+Next, harmonize and clean all results:
 
-```{python}
+```{python, message=FALSE, warning=FALSE, error=FALSE}
 df_harmonized = harmonize.harmonize_all(res_narrow, errors='raise')
 df_harmonized
 
-# Clean up other columns of data
-df_cleaned = clean.datetime(df_harmonized)  # datetime
-df_cleaned = clean.harmonize_depth(df_cleaned)  # Sample depth
+# Clean up the datetime and sample depth columns
+df_cleaned = clean.datetime(df_harmonized)
+df_cleaned = clean.harmonize_depth(df_cleaned)
 df_cleaned
-
 ```
 
-##Transform results from long to wide format
+There are many columns in the data frame that are characteristic specific, that is they have different values for the same sample depending on the characteristic. To ensure one result for each sample after the transformation of the data, these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
 
-There are many columns in the dataframe that are characteristic specific, that is they have different values for the same sample depending on the characteristic. To ensure one result for each sample after the transformation of the data these columns must either be split, generating a new column for each characteristic with values, or moved out from the table if not being used.
-
-```{python}
-
-# Split QA column into multiple characteristic specific QA columns
+```{python, message=FALSE, warning=FALSE, error=FALSE}
+# Split the QA_flag column into multiple characteristic specific QA columns
 df_full = wrangle.split_col(df_cleaned)
 
 # Divide table into columns of interest (main_df) and characteristic specific metadata (chars_df)
@@ -153,25 +178,20 @@ df_wide = wrangle.collapse_results(main_df)
 
 # Reduced columns
 df_wide.columns
-
+df_wide.head()
 ```
 
-## Map results
+Finally, the cleaned and wrangled data may be visualized as a map:
 
-```{python}
-
-# Get harmonized stations clipped to the Area of Interest
+```{python, message=FALSE, warning=FALSE, error=FALSE}
+# Get harmonized stations clipped to the AOI
 stations_gdf, stations, site_md = location.get_harmonized_stations(query, aoi=aoi_url)
 
 # Map average temperature results at each station
 gdf_temperature = visualize.map_measure(df_wide, stations_gdf, 'Temperature')
 gdf_temperature.plot(column='mean', cmap='OrRd', legend=True)
-
 ```
 
-Download location data using dataretrieval
-
-```{python}
-
-```
+<br>
 
+<br>