Skip to content

Commit

Permalink
Internal docs review1 (R. Ennis)
Browse files Browse the repository at this point in the history
  • Loading branch information
jbousquin committed Sep 30, 2023
1 parent 3139cff commit 4e4f17a
Show file tree
Hide file tree
Showing 17 changed files with 486 additions and 355 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# harmonize-wq
Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats
Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats

US EPA’s [Water Quality Portal (WQP)](https://www.waterqualitydata.us/) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using [python](https://github.com/USGS-python/dataretrieval) or [R](https://github.com/USGS-R/dataRetrieval). Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonixe_wq package is intended to be a flexible water quality specific framework to help:
- Identify differences in data units (including speciation and basis)
Expand Down
6 changes: 3 additions & 3 deletions docs/source/Code of Conduct.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# CONTRIBUTOR CODE OF CONDUCT
=============================
CONTRIBUTOR CODE OF CONDUCT
===========================

As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.

Expand All @@ -11,4 +11,4 @@ Project maintainers have the right and responsibility to remove, edit, or reject

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.

This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available at https://www.contributor-covenant.org/version/1/0/0/code-of-conduct.html
This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available at https://www.contributor-covenant.org/version/1/0/0/code-of-conduct.html.
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"sphinx.ext.coverage",
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
'sphinx.ext.autosectionlabel',
"sphinxcontrib.spelling",
]

Expand Down
13 changes: 7 additions & 6 deletions docs/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ Contributing to harmonize_wq
We’re so glad you’re thinking about contributing to an EPA open source project! If you’re unsure about anything, just ask — or submit your issue or pull request anyway. The worst that can happen is we’ll politely ask you to change something. We appreciate all friendly contributions.

We encourage you to read this project’s CONTRIBUTING policy (you are here), its
LICENSE, and its `README <https://github.com/USEPA/harmonize-wq/blob/main/README.md>`_.
`LICENSE <https://github.com/USEPA/harmonize-wq/blob/81b172afc3b72bec0a9f5624bade59eb2527510f/LICENSE>`_,
and its `README <https://github.com/USEPA/harmonize-wq/blob/main/README.md>`_.

All contributions to this project will be released under the MIT dedication. By submitting a pull request or issue, you are agreeing to comply with this waiver of copyright interest.

Expand All @@ -22,15 +23,15 @@ You can contribute in different ways:
Report issues
-------------

You can report any issues with the package, the documentation to the `issue tracker`_.
Also feel free to submit feature requests, comments or questions.
You can report any issues with the package or the documentation to the `issue tracker`_.
Also feel free to submit feature requests, comments, or questions.


Contribute code
---------------

To contribute fixes, code or documentation, fork harmonize_wq in GitHub_ and submit
the changes using a pull request against the **main** branch.
To contribute fixes, code, tests, or documentation, fork harmonize_wq in GitHub_
and submit the changes using a pull request against the **main** branch.

- If you are submitting new code, add tests (see below) and documentation.
- Write "Closes #<bug number>" in the PR description or a comment, as described in the
Expand All @@ -41,7 +42,7 @@ In any case, feel free to use the `issue tracker`_ to discuss ideas for new feat

Notice that we will not merge a PR if tests are failing. In certain cases tests pass in your
machine but not in GitHub actions. There might be multiple reasons for this but these are some of
the most common
the most common:

- Your new code does not work for other operating systems or Python versions.
- The documentation is not being built properly or the examples in the docs are
Expand Down
2 changes: 1 addition & 1 deletion docs/source/example workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ There are many columns in the :class:`pandas.DataFrame` that are characteristic
# Combine rows with the same sample organization, activity, location, and datetime
df_wide = wrangle.collapse_results(main_df)
The number of columns in the resulting table is greatly reduced
The number of columns in the resulting table is greatly reduced:

+----------------------------+-------------+----------------------------------------+-------------------------------+
| Output Column | Type | Source | Changes |
Expand Down
4 changes: 2 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
harmonize_wq:
=============
Standardize, clean and wrangle Water Quality Portal data into more analytic-ready formats
-----------------------------------------------------------------------------------------
Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats
------------------------------------------------------------------------------------------
**Useful links**:
`Code Repository <https://github.com/USEPA/harmonize-wq>`__ |
`Issues <https://github.com/USEPA/harmonize-wq/issues>`__
Expand Down
2 changes: 1 addition & 1 deletion docs/source/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Overview
========

US EPA’s `Water Quality Portal (WQP) <https://www.waterqualitydata.us/>`_ aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using `python <https://github.com/USGS-python/dataretrieval>`_ or `R <https://github.com/USGS-R/dataRetrieval>`_. Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:
US EPA’s `Water Quality Portal (WQP) <https://www.waterqualitydata.us/>`_ aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieve data using `python <https://github.com/USGS-python/dataretrieval>`_ or `R <https://github.com/USGS-R/dataRetrieval>`_. Given the variety of data and data originators, using the data in analysis often requires cleaning to ensure it meets required quality standards and wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonize_wq package is intended to be a flexible water quality specific framework to help:

* Identify differences in data units (including speciation and basis)
* Identify differences in sampling or analytic methods
Expand Down
122 changes: 64 additions & 58 deletions harmonize_wq/basis.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,21 @@
def unit_basis_dict(out_col):
"""Characteristic specific basis dictionary to define basis from units.
The out_col is often derived from :attr:`WQCharData.char_val`. The desired
basis can be used as a key to subset result.
Parameters
----------
out_col : str
Column name where results are written (char_val derived)
Column name where results are written.
Returns
-------
dict
Dictionary with logic for determining basis from units string and
standard pint units to replace those with.
standard :mod:`pint` units to replace those with.
The structure is {Basis: {standard units: [unit strings with basis]}}.
Examples
--------
Get dictionary for Phosphorus and subset for 'as P':
Expand All @@ -43,7 +46,7 @@ def unit_basis_dict(out_col):
def basis_conversion():
"""Get dictionary of conversion factors to convert basis/speciation.
For example, this is used to convert 'as PO4' to 'as P'
For example, this is used to convert 'as PO4' to 'as P'.
Returns
-------
Expand All @@ -52,11 +55,10 @@ def basis_conversion():
See Also
--------
convert.moles_to_mass()
:func:`convert.moles_to_mass`
Originally from Table 1 in 'Best Practices for Submitting Nutrient Data to
the Water Quality eXchange (WQX)
<www.epa.gov/sites/default/files/2017-06/documents/wqx_nutrient_best_practices_guide.pdf>'
`Best Practices for Submitting Nutrient Data to the Water Quality eXchange
<www.epa.gov/sites/default/files/2017-06/documents/wqx_nutrient_best_practices_guide.pdf>`_
"""
return {'NH3': 0.822,
'NH4': 0.776,
Expand All @@ -75,7 +77,7 @@ def stp_dict():
Returns
-------
dict
Dictionary with {'standard temp' : {'units': [values to replace]}}
Dictionary with {'standard temp' : {'units': [values to replace]}}.
Examples
--------
Expand All @@ -88,29 +90,32 @@ def stp_dict():
return {'@25C': {'mg/mL': ['mg/mL @25C']}}


def basis_from_unit(df_in, basis_dict, unit_col, basis_col='Speciation'):
"""Create standardized Basis column in :class:`pandas.DataFrame`.
def basis_from_unit(df_in, basis_dict, unit_col='Units', basis_col='Speciation'):
"""Move basis from units to basis column in :class:`pandas.DataFrame`.
Standardizes units in units column based on basis_dict. Units column is
updated in place, it should not be original 'ResultMeasure/MeasureUnitCode'
to maintain data integrity.
Move basis information from units in unit_col column to basis in basis_col
column based on basis_dict. If basis_col does not exist in df_in it will be
created. The unit_col column is updated in place. To maintain data
integrity unit_col should not be the original
'ResultMeasure/MeasureUnitCode' column.
Parameters
----------
df_in : pandas.DataFrame
DataFrame that will be updated.
basis_dict : dict
Dictionary with structure {basis:{new_unit:[old_units]}}.
unit_col : str
string for the column name in df to be used.
unit_col : str, optional
String for the units column name in df_in to use.
The default is 'Units'.
basis_col : str, optional
string for the basis column name in df to be used.
String for the basis column name in df_in to use.
The default is 'Speciation'.
Returns
-------
df : pandas.DataFrame
Updated copy of df_in
Updated copy of df_in.
Examples
--------
Expand All @@ -134,8 +139,8 @@ def basis_from_unit(df_in, basis_dict, unit_col, basis_col='Speciation'):
0 Phosphorus mg/l as P mg/l as P
1 Phosphorus mg/kg as P mg/kg as P
If an existing basis_col value is different a warning is issued when it is
updated and a QA_flag is assigned
If an existing basis_col value is different, a warning is issued when it is
updated and a QA_flag is assigned:
>>> from numpy import nan
>>> df['Speciation'] = [nan, 'as PO4']
Expand Down Expand Up @@ -226,15 +231,23 @@ def basis_from_methodSpec(df_in):


def update_result_basis(df_in, basis_col, unit_col):
"""Basis from result col that is not moved to a new col.
"""Move basis from unit_col column to basis_col column.
This is usually used in place of basis_from_unit when the basis_col is not
'ResultMeasure/MeasureUnitCode' (i.e., not speciation).
Notes
-----
Rather than creating many new empty columns this function currently overwrites the original
basis_col values. The original values are noted in the QA_flag.
Parameters
----------
df_in : pandas.DataFrame
DataFrame that will be updated.
basis_col : str
Column in df_in with result basis to update. Expected values are
'ResultTemperatureBasisText'
'ResultTemperatureBasisText'.
unit_col : str
Column in df_in with units that may contain basis.
Expand All @@ -246,7 +259,6 @@ def update_result_basis(df_in, basis_col, unit_col):
Examples
--------
Build pandas DataFrame for example:
Note: 'Units' is used to preserve 'ResultMeasure/MeasureUnitCode'
>>> from pandas import DataFrame
>>> from numpy import nan
Expand All @@ -258,7 +270,7 @@ def update_result_basis(df_in, basis_col, unit_col):
CharacteristicName ResultTemperatureBasisText Units
0 Salinity 25 deg C mg/mL @25C
1 Salinity NaN mg/mL @25C
>>> from harmonize_wq import basis
>>> df_temp_basis = basis.update_result_basis(df,
... 'ResultTemperatureBasisText',
Expand Down Expand Up @@ -294,7 +306,7 @@ def update_result_basis(df_in, basis_col, unit_col):


def set_basis(df_in, mask, basis, basis_col='Speciation'):
"""Update basis_col to basis where col is expected_val.
"""Update or create basis_col with basis as value.
Parameters
----------
Expand All @@ -311,8 +323,34 @@ def set_basis(df_in, mask, basis, basis_col='Speciation'):
Returns
-------
df_out : pandas.DataFrame
Updated copy of df_in
Updated copy of df_in.
Examples
--------
Build pandas DataFrame for example:
>>> from pandas import DataFrame
>>> df = DataFrame({'CharacteristicName': ['Phosphorus',
... 'Phosphorus',
... 'Salinity'],
... 'MethodSpecificationName': ['as P', 'as PO4', ''],
... })
>>> df
CharacteristicName MethodSpecificationName
0 Phosphorus as P
1 Phosphorus as PO4
2 Salinity
Build mask for example:
>>> mask = df['CharacteristicName']=='Phosphorus'
>>> from harmonize_wq import basis
>>> basis.set_basis(df, mask, basis='as P')
CharacteristicName MethodSpecificationName Speciation
0 Phosphorus as P as P
1 Phosphorus as PO4 as P
2 Salinity NaN
"""
df_out = df_in.copy()
# Add Basis column if it doesn't exist
Expand All @@ -321,35 +359,3 @@ def set_basis(df_in, mask, basis, basis_col='Speciation'):
# Populate Basis column where expected value with basis
df_out.loc[mask, basis_col] = basis
return df_out


def basis_qa_flag(trouble, basis, spec_col='MethodSpecificationName'):
"""Get QA_flag for different basis in MethodsSpeciation and units.
NOTE: Deprecate - not currently in use anywhere
Parameters
----------
trouble : str
Problem encountered (e.g., unit_basis != speciation).
basis : str
The basis from the unit that replaced the original speciation.
spec_col : str, optional
Column currently being checked. Default is 'MethodSpecificationName'
Returns
-------
str
Flag to use in QA_flag column.
Examples
--------
Formats QA_Flag
>>> from harmonize_wq import basis
>>> basis.basis_qa_flag('(units)',
... 'updated from 25 deg C to @25C',
... 'ResultTemperatureBasisText')
'ResultTemperatureBasisText: updated from 25 deg C to @25C (units)'
"""
return '{}: {} {}'.format(spec_col, basis, trouble)
Loading

0 comments on commit 4e4f17a

Please sign in to comment.