Update anomaly_detection model for new photometry #329

matwey · 2023-09-04T12:29:52Z

IMPORTANT: Please create an issue first before opening a Pull Request.
Linked to issue(s): Closes put link

What changes were proposed in this pull request?

Retrain anomaly_detection model on lc_features_20210617_photometry_corrected.parquet

How was this patch tested?

N/A

JulienPeloton · 2023-09-06T19:25:07Z

Hi @matwey -- thanks for the retraining. Any ideas why the model grew from 13MB to 30MB with a similar dataset?

JulienPeloton · 2023-09-06T20:11:12Z

@matwey -- we use a very specific version of sklearn (see here for the version used). Hence the CI is failing:

/home/libs/miniconda/lib/python3.9/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator ExtraTreeRegressor from version 1.3.0 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
  warnings.warn(
Traceback (most recent call last):
  File "/__w/fink-science/fink-science/fink_science/anomaly_detection/processor.py", line 66, in <module>
    forest_r = pickle.load(forest_file)
  File "sklearn/tree/_tree.pyx", line 653, in sklearn.tree._tree.Tree.__setstate__
  File "sklearn/tree/_tree.pyx", line 1340, in sklearn.tree._tree._check_node_ndarray
ValueError: node array from the pickle has an incompatible dtype:
- expected: [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]
- got     : {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}

Could you retrain using sklearn 1.0.2 ?

matwey · 2023-09-07T11:08:54Z

Hi @JulienPeloton

I've retrained using sklearn 1.0.2 and Python 3.9 and force-pushed the commit
Currently, I use notebooks provided by @VoidDruid and there is a hyper-parameter tuning stage with GridSearchCV. I suppose that hyper-parameters (trees number) appeared to be different for different data.

However, I think that it is desirable to migrate to ONNX to address both questions. First, it is size-efficient, second it is version compatible. I think this should be next step for this module.

JulienPeloton · 2023-09-07T11:16:57Z

Thanks for the retraining -- there is one unit test failing (expected as you changed the model):

File "/__w/fink-science/fink-science/fink_science/__init__.py", line ?, in __main__.anomaly_score
Failed example:
    df.filter(df["anomaly_score"] < -0.5).count()
Expected:
    15
Got:
    7

However, I think that it is desirable to migrate to ONNX to address both questions

I completely agree!

matwey · 2023-09-07T11:53:03Z

Thanks for the retraining -- there is one unit test failing (expected as you changed the model):
File "/__w/fink-science/fink-science/fink_science/__init__.py", line ?, in __main__.anomaly_score
Failed example:
    df.filter(df["anomaly_score"] < -0.5).count()
Expected:
    15
Got:
    7

This was from a docsting, and fixed now.

codecov-commenter · 2023-09-07T12:57:05Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (68fa05e) 53.44% compared to head (05d090a) 53.44%.
Report is 1 commits behind head on master.

❗ Current head 05d090a differs from pull request most recent head 4149708. Consider uploading reports for the commit 4149708 to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #329   +/-   ##
=======================================
  Coverage   53.44%   53.44%           
=======================================
  Files          33       33           
  Lines        1510     1510           
=======================================
  Hits          807      807           
  Misses        703      703

Files Changed	Coverage Δ
fink_science/anomaly_detection/processor.py	`77.35% <ø> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

JulienPeloton · 2023-09-07T13:12:09Z

Thanks! I'm merging.

matwey force-pushed the anomaly_detection_model_g2 branch from 8158731 to 7b93508 Compare September 7, 2023 10:54

Update anomaly_detetction model for new photometry

4149708

matwey force-pushed the anomaly_detection_model_g2 branch from 7b93508 to 4149708 Compare September 7, 2023 11:19

JulienPeloton merged commit 9e380a8 into astrolabsoftware:master Sep 7, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update anomaly_detection model for new photometry #329

Update anomaly_detection model for new photometry #329

matwey commented Sep 4, 2023

JulienPeloton commented Sep 6, 2023

JulienPeloton commented Sep 6, 2023

matwey commented Sep 7, 2023

JulienPeloton commented Sep 7, 2023

matwey commented Sep 7, 2023

codecov-commenter commented Sep 7, 2023 •

edited

Loading

JulienPeloton commented Sep 7, 2023

Update anomaly_detection model for new photometry #329

Update anomaly_detection model for new photometry #329

Conversation

matwey commented Sep 4, 2023

What changes were proposed in this pull request?

How was this patch tested?

JulienPeloton commented Sep 6, 2023

JulienPeloton commented Sep 6, 2023

matwey commented Sep 7, 2023

JulienPeloton commented Sep 7, 2023

matwey commented Sep 7, 2023

codecov-commenter commented Sep 7, 2023 • edited Loading

Codecov Report

JulienPeloton commented Sep 7, 2023

codecov-commenter commented Sep 7, 2023 •

edited

Loading