Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update anomaly_detection model for new photometry #329

Merged

Conversation

matwey
Copy link
Contributor

@matwey matwey commented Sep 4, 2023

IMPORTANT: Please create an issue first before opening a Pull Request.
Linked to issue(s): Closes put link

What changes were proposed in this pull request?

Retrain anomaly_detection model on lc_features_20210617_photometry_corrected.parquet

How was this patch tested?

N/A

@JulienPeloton
Copy link
Member

Hi @matwey -- thanks for the retraining. Any ideas why the model grew from 13MB to 30MB with a similar dataset?

@JulienPeloton
Copy link
Member

@matwey -- we use a very specific version of sklearn (see here for the version used). Hence the CI is failing:

/home/libs/miniconda/lib/python3.9/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator ExtraTreeRegressor from version 1.3.0 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
  warnings.warn(
Traceback (most recent call last):
  File "/__w/fink-science/fink-science/fink_science/anomaly_detection/processor.py", line 66, in <module>
    forest_r = pickle.load(forest_file)
  File "sklearn/tree/_tree.pyx", line 653, in sklearn.tree._tree.Tree.__setstate__
  File "sklearn/tree/_tree.pyx", line 1340, in sklearn.tree._tree._check_node_ndarray
ValueError: node array from the pickle has an incompatible dtype:
- expected: [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]
- got     : {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}

Could you retrain using sklearn 1.0.2 ?

@matwey matwey force-pushed the anomaly_detection_model_g2 branch from 8158731 to 7b93508 Compare September 7, 2023 10:54
@matwey
Copy link
Contributor Author

matwey commented Sep 7, 2023

Hi @JulienPeloton

  1. I've retrained using sklearn 1.0.2 and Python 3.9 and force-pushed the commit
  2. Currently, I use notebooks provided by @VoidDruid and there is a hyper-parameter tuning stage with GridSearchCV. I suppose that hyper-parameters (trees number) appeared to be different for different data.

However, I think that it is desirable to migrate to ONNX to address both questions. First, it is size-efficient, second it is version compatible. I think this should be next step for this module.

@JulienPeloton
Copy link
Member

Thanks for the retraining -- there is one unit test failing (expected as you changed the model):

File "/__w/fink-science/fink-science/fink_science/__init__.py", line ?, in __main__.anomaly_score
Failed example:
    df.filter(df["anomaly_score"] < -0.5).count()
Expected:
    15
Got:
    7

However, I think that it is desirable to migrate to ONNX to address both questions

I completely agree!

@matwey matwey force-pushed the anomaly_detection_model_g2 branch from 7b93508 to 4149708 Compare September 7, 2023 11:19
@matwey
Copy link
Contributor Author

matwey commented Sep 7, 2023

Thanks for the retraining -- there is one unit test failing (expected as you changed the model):

File "/__w/fink-science/fink-science/fink_science/__init__.py", line ?, in __main__.anomaly_score
Failed example:
    df.filter(df["anomaly_score"] < -0.5).count()
Expected:
    15
Got:
    7

This was from a docsting, and fixed now.

@codecov-commenter
Copy link

codecov-commenter commented Sep 7, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (68fa05e) 53.44% compared to head (05d090a) 53.44%.
Report is 1 commits behind head on master.

❗ Current head 05d090a differs from pull request most recent head 4149708. Consider uploading reports for the commit 4149708 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #329   +/-   ##
=======================================
  Coverage   53.44%   53.44%           
=======================================
  Files          33       33           
  Lines        1510     1510           
=======================================
  Hits          807      807           
  Misses        703      703           
Files Changed Coverage Δ
fink_science/anomaly_detection/processor.py 77.35% <ø> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JulienPeloton
Copy link
Member

Thanks! I'm merging.

@JulienPeloton JulienPeloton merged commit 9e380a8 into astrolabsoftware:master Sep 7, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants