-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features/auto qc #180
Features/auto qc #180
Conversation
9e46356
to
bf18598
Compare
I've just left a few comments regarding structure and naming conventions. I've also been testing the functionality of the QC steps and I am running into problems. Some of these might be to do with both of these QC development branches not being up-to-date with the So I think it is best to addres the minor issues I have commented on, then merge, and then we can do a larger review over on Just so you have a heads up @RasmusBahbah, I'm having problems in the script creating a database |
6b45719
to
97b3929
Compare
97b3929
to
fd0b582
Compare
* Added unit tests for testing boundary conditions * Use max aggregation instead of sum to detect determine static data in window. * Modifying the loop to fix issue where the last index wasn't processed.
* Added thresholds as input parameters with default values * Cleaned up out commented code * Applied black formatting on file
* diff_period -> period * static_limit -> max_diff
* static_qc default threshold * Added debug logging to static_qc.py * Cleaned whitespaces in L1toL2.py
Updated L0toL1 to keep the `format` attribute in L1 data
The statement `all(f)` always evaluates to True because * all `format` strings in aws-l0 config files is in {'STM', 'TX', 'raw'}. * `f` is never empty because `self.L0` is never empty
`format` is now also available in the current L3 dataset.
a32bae1
to
e40205a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I just checked the unit testing and then tested it with some L0tx
and L0raw
files. All runs smoothly. Go ahead and merge.
src/pypromice/qc/static_qc_test.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file needs some extra lines adding at the end so that the unit test is triggered when the script is run, i.e.:
if __name__ == "__main__":
unittest.main()
src/pypromice/qc/static_qc_test.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation needed for each function here - it can just be a short definition, for example:
def get_random_datetime() -> datetime.datetime:
'''Select random timestamp in the period `1970-2030'''
src/pypromice/qc/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the submodule file names need changing as the naming style does not fit with the rest of the toolbox. I suggest:
compute_percentiles.py and percentiles.py >> percentile.py
static_qc.py >> persistence.py
static_qc_test.py >> persistence_test.py
Also, I suggest a structuring change in the qc
module initialization:
from pypromice.qc.percentile import *
from pypromice.qc.persistence import *
Then we can call, for instance, percentileQC
merely with pypromice.qc.percentileQC
rather than pypromice.qc.percentile.percentileQC
. I think if this module becomes bigger, then submodule divisions will be needed. But for now, the functions are very streamlined (nice!)
src/pypromice/process/L1toL2.py
Outdated
@@ -54,7 +56,7 @@ def toL2(L1, T_0=273.15, ews=1013.246, ei0=6.1071, eps_overcast=1., | |||
|
|||
|
|||
|
|||
ds = differenceQC(ds) # Flag and Remove difference outliers | |||
ds = apply_static_qc(ds) # Flag and Remove difference outliers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer calling this a persistence QC rather than a static QC, as 'static' is an ambiguous term. So perhaps rename these functions to, e.g. persistence_qc()
?
I have done some restructuring of the repository and created a
qc
module that can contain different algorithms for detecting invalid data.