Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/auto qc #180

Merged
merged 15 commits into from
Oct 3, 2023
Merged

Features/auto qc #180

merged 15 commits into from
Oct 3, 2023

Conversation

ladsmund
Copy link
Contributor

@ladsmund ladsmund commented Sep 8, 2023

I have done some restructuring of the repository and created a qc module that can contain different algorithms for detecting invalid data.

@PennyHow
Copy link
Member

I've just left a few comments regarding structure and naming conventions. I've also been testing the functionality of the QC steps and I am running into problems. Some of these might be to do with both of these QC development branches not being up-to-date with the main branch.

So I think it is best to addres the minor issues I have commented on, then merge, and then we can do a larger review over on percentile-qc.

Just so you have a heads up @RasmusBahbah, I'm having problems in the script creating a database .db file if one does not already exist. We can look at it once this branch has been merged.

@ladsmund ladsmund force-pushed the features/auto_qc branch 2 times, most recently from 6b45719 to 97b3929 Compare September 20, 2023 08:05
@ladsmund ladsmund changed the base branch from percentile-qc to main September 20, 2023 08:06
@ladsmund ladsmund marked this pull request as ready for review September 28, 2023 12:50
patrickjwright and others added 15 commits September 29, 2023 12:54
* Added unit tests for testing boundary conditions
* Use max aggregation instead of sum to detect determine static data in window.
* Modifying the loop to fix issue where the last index wasn't processed.
* Added thresholds as input parameters with default values
* Cleaned up out commented code
* Applied black formatting on file
* diff_period -> period
* static_limit -> max_diff
* static_qc default threshold
* Added debug logging to static_qc.py
* Cleaned whitespaces in L1toL2.py
Updated L0toL1 to keep the `format` attribute in L1 data
The statement `all(f)` always evaluates to True because
* all `format` strings in aws-l0 config files is in {'STM', 'TX', 'raw'}.
* `f` is never empty because `self.L0` is never empty
`format` is now also available in the current L3 dataset.
Copy link
Member

@PennyHow PennyHow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just checked the unit testing and then tested it with some L0tx and L0raw files. All runs smoothly. Go ahead and merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file needs some extra lines adding at the end so that the unit test is triggered when the script is run, i.e.:

if __name__ == "__main__":  
    unittest.main()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation needed for each function here - it can just be a short definition, for example:

def get_random_datetime() -> datetime.datetime:
   '''Select random timestamp in the period `1970-2030'''

src/pypromice/qc/static_qc_test.py Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the submodule file names need changing as the naming style does not fit with the rest of the toolbox. I suggest:

compute_percentiles.py and percentiles.py >> percentile.py
static_qc.py >> persistence.py
static_qc_test.py >> persistence_test.py

Also, I suggest a structuring change in the qc module initialization:

from pypromice.qc.percentile import *
from pypromice.qc.persistence import *

Then we can call, for instance, percentileQC merely with pypromice.qc.percentileQC rather than pypromice.qc.percentile.percentileQC. I think if this module becomes bigger, then submodule divisions will be needed. But for now, the functions are very streamlined (nice!)

@@ -54,7 +56,7 @@ def toL2(L1, T_0=273.15, ews=1013.246, ei0=6.1071, eps_overcast=1.,



ds = differenceQC(ds) # Flag and Remove difference outliers
ds = apply_static_qc(ds) # Flag and Remove difference outliers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer calling this a persistence QC rather than a static QC, as 'static' is an ambiguous term. So perhaps rename these functions to, e.g. persistence_qc()?

.github/workflows/unit_test.yml Show resolved Hide resolved
@ladsmund ladsmund merged commit 21d8bdd into main Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants