Features/auto qc #180

ladsmund · 2023-09-08T13:50:04Z

I have done some restructuring of the repository and created a qc module that can contain different algorithms for detecting invalid data.

PennyHow · 2023-09-14T16:04:38Z

I've just left a few comments regarding structure and naming conventions. I've also been testing the functionality of the QC steps and I am running into problems. Some of these might be to do with both of these QC development branches not being up-to-date with the main branch.

So I think it is best to addres the minor issues I have commented on, then merge, and then we can do a larger review over on percentile-qc.

Just so you have a heads up @RasmusBahbah, I'm having problems in the script creating a database .db file if one does not already exist. We can look at it once this branch has been merged.

* Added unit tests for testing boundary conditions * Use max aggregation instead of sum to detect determine static data in window. * Modifying the loop to fix issue where the last index wasn't processed.

* Added thresholds as input parameters with default values * Cleaned up out commented code * Applied black formatting on file

* diff_period -> period * static_limit -> max_diff

* static_qc default threshold * Added debug logging to static_qc.py * Cleaned whitespaces in L1toL2.py

Updated L0toL1 to keep the `format` attribute in L1 data

The statement `all(f)` always evaluates to True because * all `format` strings in aws-l0 config files is in {'STM', 'TX', 'raw'}. * `f` is never empty because `self.L0` is never empty

`format` is now also available in the current L3 dataset.

PennyHow

Looks good. I just checked the unit testing and then tested it with some L0tx and L0raw files. All runs smoothly. Go ahead and merge.

PennyHow · 2023-09-14T15:21:28Z

src/pypromice/qc/static_qc_test.py

I think this file needs some extra lines adding at the end so that the unit test is triggered when the script is run, i.e.:

if __name__ == "__main__": unittest.main()

PennyHow · 2023-09-14T15:23:21Z

src/pypromice/qc/static_qc_test.py

Documentation needed for each function here - it can just be a short definition, for example:

def get_random_datetime() -> datetime.datetime: '''Select random timestamp in the period `1970-2030'''

src/pypromice/qc/static_qc_test.py

PennyHow · 2023-09-14T15:32:42Z

src/pypromice/qc/__init__.py

I think the submodule file names need changing as the naming style does not fit with the rest of the toolbox. I suggest:

compute_percentiles.py and percentiles.py >> percentile.py
static_qc.py >> persistence.py
static_qc_test.py >> persistence_test.py

Also, I suggest a structuring change in the qc module initialization:

from pypromice.qc.percentile import * from pypromice.qc.persistence import *

Then we can call, for instance, percentileQC merely with pypromice.qc.percentileQC rather than pypromice.qc.percentile.percentileQC. I think if this module becomes bigger, then submodule divisions will be needed. But for now, the functions are very streamlined (nice!)

PennyHow · 2023-09-14T15:34:11Z

src/pypromice/process/L1toL2.py

@@ -54,7 +56,7 @@ def toL2(L1, T_0=273.15, ews=1013.246, ei0=6.1071, eps_overcast=1.,



-    ds = differenceQC(ds)                                                      # Flag and Remove difference outliers     
+    ds = apply_static_qc(ds)                                                      # Flag and Remove difference outliers


I prefer calling this a persistence QC rather than a static QC, as 'static' is an ambiguous term. So perhaps rename these functions to, e.g. persistence_qc()?

.github/workflows/unit_test.yml

ladsmund requested a review from RasmusBahbah September 8, 2023 13:51

ladsmund force-pushed the features/auto_qc branch from 9e46356 to bf18598 Compare September 14, 2023 08:46

ladsmund requested a review from PennyHow September 14, 2023 08:58

ladsmund force-pushed the features/auto_qc branch 2 times, most recently from 6b45719 to 97b3929 Compare September 20, 2023 08:05

ladsmund changed the base branch from percentile-qc to main September 20, 2023 08:06

ladsmund force-pushed the features/auto_qc branch from 97b3929 to fd0b582 Compare September 22, 2023 10:21

ladsmund marked this pull request as ready for review September 28, 2023 12:50

patrickjwright and others added 15 commits September 29, 2023 12:54

initial implementation of QC Static

466030f

Moved differenceQC to separate module

2da188a

Updated difference_qc

eed4969

* Added unit tests for testing boundary conditions * Use max aggregation instead of sum to detect determine static data in window. * Modifying the loop to fix issue where the last index wasn't processed.

Renamed to from difference_qc to static_qc

046f1bc

Updated and renamed staticQC->apply_static_qc

9e4aeb3

* Added thresholds as input parameters with default values * Cleaned up out commented code * Applied black formatting on file

Renamed threshold parameters:

d69efe4

* diff_period -> period * static_limit -> max_diff

Added static_qc_test to ci: unit_test.yml

5c1e146

Finished StaticQC module

5cd2264

* static_qc default threshold * Added debug logging to static_qc.py * Cleaned whitespaces in L1toL2.py

Added count_consecutive_static_values for making it easier to evaluate

c7bd756

Updated toL2 type hints

c9a669e

Limited apply_static_qc to be applied on tx data.

8419ce4

Updated L0toL1 to keep the `format` attribute in L1 data

Minor spell correction

6680275

Removed unreachable if/else clause

92b24e2

The statement `all(f)` always evaluates to True because * all `format` strings in aws-l0 config files is in {'STM', 'TX', 'raw'}. * `f` is never empty because `self.L0` is never empty

Use format string from L3 instead of L0

3751191

`format` is now also available in the current L3 dataset.

Updated L1toL2.py to use logger.info instead of print

e40205a

ladsmund force-pushed the features/auto_qc branch from a32bae1 to e40205a Compare September 29, 2023 10:55

PennyHow approved these changes Oct 3, 2023

View reviewed changes

ladsmund merged commit 21d8bdd into main Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/auto qc #180

Features/auto qc #180

ladsmund commented Sep 8, 2023 •

edited

Loading

PennyHow commented Sep 14, 2023

PennyHow left a comment

PennyHow Sep 14, 2023

PennyHow Sep 14, 2023

PennyHow Sep 14, 2023

PennyHow Sep 14, 2023

		@@ -54,7 +56,7 @@ def toL2(L1, T_0=273.15, ews=1013.246, ei0=6.1071, eps_overcast=1.,



		ds = differenceQC(ds) # Flag and Remove difference outliers
		ds = apply_static_qc(ds) # Flag and Remove difference outliers

Features/auto qc #180

Features/auto qc #180

Conversation

ladsmund commented Sep 8, 2023 • edited Loading

PennyHow commented Sep 14, 2023

PennyHow left a comment

Choose a reason for hiding this comment

PennyHow Sep 14, 2023

Choose a reason for hiding this comment

PennyHow Sep 14, 2023

Choose a reason for hiding this comment

PennyHow Sep 14, 2023

Choose a reason for hiding this comment

PennyHow Sep 14, 2023

Choose a reason for hiding this comment

ladsmund commented Sep 8, 2023 •

edited

Loading