Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using multiple anomaly detection models #384

Merged
merged 19 commits into from
Jul 29, 2024
Merged

Using multiple anomaly detection models #384

merged 19 commits into from
Jul 29, 2024

Conversation

Knispel2
Copy link
Contributor

@Knispel2 Knispel2 commented Apr 12, 2024

We want to use several models at the same time to detect anomalies (#377 ). I assume that for each model in the directory /data/models/anomaly_detection there will be a separate archive named anomaly_detection_forest_AAD{name_model}.zip and the call will occur as follows:

MODELS = ['first', 'second', ...]
for model in MODELS:
	df = df.withColumn(f'anomaly_score_{model}', anomaly_core('lc_features', model=model)

Are there any problems with this approach?

Where is the best place to store a list of models?

@JulienPeloton
Copy link
Member

Hi @Knispel2 -- sorry for the delay.

Are there any problems with this approach?

No problem as long as the models are not too big. How many models do you foreseen?

Where is the best place to store a list of models?

For the moment, under fink_science/data/models/anomaly_detection.

@Knispel2
Copy link
Contributor Author

Knispel2 commented Jun 2, 2024

Copy link
Member

@JulienPeloton JulienPeloton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Knispel2 -- and deeply sorry for the delay. I do not understand why you kept the model name '' in the code as you removed the base model.

Also, could you fix the conflict with the origin?

fink_science/anomaly_detection/processor.py Outdated Show resolved Hide resolved
fink_science/anomaly_detection/processor.py Show resolved Hide resolved
fink_science/anomaly_detection/processor.py Show resolved Hide resolved
fink_science/anomaly_detection/processor.py Outdated Show resolved Hide resolved
Copy link
Member

@JulienPeloton JulienPeloton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Knispel2 -- I will then perform profiling check and come back to you before the merge.

fink_science/anomaly_detection/processor.py Show resolved Hide resolved
@JulienPeloton
Copy link
Member

Note though that the CI keeps failing with a weird error:

Exception raised:
    Traceback (most recent call last):
      File "/home/libs/miniconda/lib/python3.9/doctest.py", line 1336, in __run
        exec(compile(example.source, filename, "single",
      File "<doctest __main__.anomaly_score[12]>", line 1, in <module>
        df.filter(df["anomaly_score"] < -0.013).count()
      File "/home/libs/spark-3.4.1-bin-hadoop3/python/pyspark/sql/dataframe.py", line 1193, in count
        return int(self._jdf.count())
      File "/home/libs/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
        return_value = get_return_value(
      File "/home/libs/spark-3.4.1-bin-hadoop3/python/pyspark/errors/exceptions/captured.py", line 175, in deco
        raise converted from None
    pyspark.errors.exceptions.captured.PythonException: 
      An exception was thrown from the Python worker. Please see the stack trace below.
    Traceback (most recent call last):
      File "/__w/fink-science/fink-science/fink_science/anomaly_detection/processor.py", line 165, in anomaly_score
        forest_r_AAD = rt.InferenceSession(r_model_path_AAD)
      File "/home/libs/miniconda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 420, in __init__
        except (ValueError, RuntimeError) as e:
      File "/home/libs/miniconda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
        sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
    onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/forest_r_AAD.onnx failed:Protobuf parsing failed.

it happens randomly -- sometimes it fails, sometimes it passes. I tried to investigate, but without luck. Do you have any idea what is happening?

@JulienPeloton
Copy link
Member

@Knispel2 -- all good for me. Can you fix the conflict, and then I will merge this PR and review the one in fink-filters.

@Knispel2
Copy link
Contributor Author

Note though that the CI keeps failing with a weird error:

Exception raised:
    Traceback (most recent call last):
      File "/home/libs/miniconda/lib/python3.9/doctest.py", line 1336, in __run
        exec(compile(example.source, filename, "single",
      File "<doctest __main__.anomaly_score[12]>", line 1, in <module>
        df.filter(df["anomaly_score"] < -0.013).count()
      File "/home/libs/spark-3.4.1-bin-hadoop3/python/pyspark/sql/dataframe.py", line 1193, in count
        return int(self._jdf.count())
      File "/home/libs/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
        return_value = get_return_value(
      File "/home/libs/spark-3.4.1-bin-hadoop3/python/pyspark/errors/exceptions/captured.py", line 175, in deco
        raise converted from None
    pyspark.errors.exceptions.captured.PythonException: 
      An exception was thrown from the Python worker. Please see the stack trace below.
    Traceback (most recent call last):
      File "/__w/fink-science/fink-science/fink_science/anomaly_detection/processor.py", line 165, in anomaly_score
        forest_r_AAD = rt.InferenceSession(r_model_path_AAD)
      File "/home/libs/miniconda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 420, in __init__
        except (ValueError, RuntimeError) as e:
      File "/home/libs/miniconda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
        sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
    onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/forest_r_AAD.onnx failed:Protobuf parsing failed.

it happens randomly -- sometimes it fails, sometimes it passes. I tried to investigate, but without luck. Do you have any idea what is happening?

Sorry for the delay in reply. I have tried many times to replicate this error on my computer, but without success. So I thought it was some specific feature of Github

@Knispel2
Copy link
Contributor Author

@Knispel2 -- all good for me. Can you fix the conflict, and then I will merge this PR and review the one in fink-filters.

Done!

@JulienPeloton JulienPeloton merged commit 96dc90e into astrolabsoftware:master Jul 29, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants