Do not delete quality and template metrics on recompute #3292

chrishalcrow · 2024-08-07T15:58:12Z

Possible solution to #3256 and some problems in #3251

This PR alters the way quality_metrics are computed, so that ones which are computed are appended to the extension data, rather than replacing it. This is done by using a new initialisation of the quality_metrics extension.

This might be controversial. Some points in its favour:

If you forget to calculate your favourite quality_metric, you can just calculate as you'd expect.
If you compute a quality_metric with new parameters, it replaces the old one. So your quality_metrics are always up to date.
If parents of the quality_metrics extension are recomputed, the quality_metrics are deleted. Hence it is impossible to have some quality_metrics calculated using waveforms with one parameter set and some waveforms with other parameters. So provenance seems intact to me.

Happy to see other solutions to this problem: @samuelgarcia suggested a new method called append (or something similar) but I couldn't figure out nice syntax for this.

src/spikeinterface/qualitymetrics/quality_metric_calculator.py

zm711 · 2024-08-09T13:49:03Z

On a slightly more global discussion point this/or a similar strategy would be needed for template metrics as well right? So that if a new metric is implemented someone can just add that to their old template metrics rather than have to re-calculate.

chrishalcrow · 2024-09-06T15:37:48Z

@samuelgarcia Does this look ok? I'll do a similar thing to template_metrics shortly.

zm711 · 2024-09-07T13:04:35Z

much like adding fast to the end of a method a commit of tests now pass is always a dangerous gamble :P

chrishalcrow · 2024-09-09T09:09:25Z

much like adding fast to the end of a method a commit of tests now pass is always a dangerous gamble :P

I like to live dangerously

src/spikeinterface/postprocessing/template_metrics.py

JoeZiminski · 2024-09-10T12:16:19Z

src/spikeinterface/postprocessing/tests/test_template_metrics.py

@@ -3,6 +3,36 @@
 import pytest


+def test_compute_new_template_metrics(small_sorting_analyzer):


src/spikeinterface/postprocessing/tests/test_template_metrics.py

JoeZiminski · 2024-09-10T12:31:12Z

src/spikeinterface/postprocessing/tests/test_template_metrics.py

+
+    template_metric_extension = small_sorting_analyzer.get_extension("template_metrics")
+
+    assert "half_width" not in list(template_metric_extension.get_data().keys())


same here might as well check against full default list if it was indeed all computed in the first call

src/spikeinterface/postprocessing/tests/test_template_metrics.py

JoeZiminski · 2024-09-10T12:55:29Z

Hey @chrishalcrow this is looking nice! I am not very familiar with the sorting_analyzer and it's extensions so am using this PR as a learning opportunity 😅 apologies for all the questions. It's nice that the quality-metrics / template-metrics solution takes the same approach here, after I understand the template_metrics a bit more I will take a quick look at the quality metrics, play around with it in code and give any last feedback.

chrishalcrow · 2024-09-10T13:35:24Z

Hey @chrishalcrow this is looking nice! I am not very familiar with the sorting_analyzer and it's extensions so am using this PR as a learning opportunity 😅 apologies for all the questions. It's nice that the quality-metrics / template-metrics solution takes the same approach here, after I understand the template_metrics a bit more I will take a quick look at the quality metrics, play around with it in code and give any last feedback.

Thanks - I had totally messed up the template_metrics. The quality_metrics are a bit simpler, I hope!

chrishalcrow · 2024-09-10T17:36:16Z

Hey @chrishalcrow this is looking nice! I am not very familiar with the sorting_analyzer and it's extensions so am using this PR as a learning opportunity 😅 apologies for all the questions. It's nice that the quality-metrics / template-metrics solution takes the same approach here, after I understand the template_metrics a bit more I will take a quick look at the quality metrics, play around with it in code and give any last feedback.

Hello, hopefully template_metrics is simpler and correct now!

One big bug I missed: you can't pass any directly parameters to run (which is what compute calls to actually calculate things). See here-ish:

spikeinterface/src/spikeinterface/core/sortinganalyzer.py

Line 1250 in c240155

extension_instance = extension_class(self)

So you'd like set_params to just deal with parameters and run to just deal with computing, but you can't tell run which metrics to calculate. So you've got to sneak them in using a property of the extension class.

I also wasn't meaning to recompute everything but it was tricky to see because the end result was the same and these are pretty fast to compute. Not sure how you could test this.

Don't look at quality_metrics until I've made a few more commits!

src/spikeinterface/postprocessing/template_metrics.py

chrishalcrow · 2024-09-11T09:25:09Z

Hello, response to @samuelgarcia done and quality_metrics now updated to look a lot more like what template_metrics ended up as.

One bit of a pain: some metrics are named one thing, like drift, but compute several things, like drift_ptp, drift_std and `drift_mad. It's helpful to know these so I put them in a dict and used it at some point. I've wanted this dict before, so I think it's good to have.

@JoeZiminski this is ready for an inspection! I think it's a lot clearer now!

alejoe91 · 2024-09-11T09:28:38Z

Hello, response to @samuelgarcia done and quality_metrics now updated to look a lot more like what template_metrics ended up as.

One bit of a pain: some metrics are named one thing, like drift, but compute several things, like drift_ptp, drift_std and `drift_mad. It's helpful to know these so I put them in a dict and used it at some point. I've wanted this dict before, so I think it's good to have.

@chrishalcrow good point! I think it would be great to keep an internal dict of quality metric name to metric table column names!

chrishalcrow · 2024-09-11T09:35:21Z

src/spikeinterface/qualitymetrics/quality_metric_list.py

@@ -53,3 +53,29 @@
    "drift": compute_drift_metrics,
    "sd_ratio": compute_sd_ratio,
 }
+
+# a dict converting the name of the metric for computation to the output of that computation


@alejoe91 You are in luck my friend (will figure out how to test this soon...)

src/spikeinterface/postprocessing/template_metrics.py

src/spikeinterface/qualitymetrics/quality_metric_calculator.py

zm711

I wish I could review this a bit more, but I think Joe is being a strong reviewer. Maybe more comments if I have time before merging, but I support the idea in general.

zm711 · 2024-09-11T15:32:16Z

src/spikeinterface/postprocessing/template_metrics.py

+            existing_params = tm_extension.params["metrics_kwargs"]
+            # checks that existing metrics were calculated using the same params
+            if existing_params != metrics_kwargs_:
+                warnings.warn(


I think if @h-mayorquin has time he could comment on the value of this particular warning. Not to slow down this PR if it is ready otherwise.

In defence of the warning:
Suppose a user is going through a large set of sorting_analyzers that they computed over a long time span. They want to compute a new template_metric that they’ve heard about. They know it takes a long time to recompute everything, so they just specify the new metric to compute. Sometimes all the other metrics are deleted! Sometimes they are not!! What is going on?? This warning would have told them. Plus it’s quite rare to get the warning. The default behaviour (compute everything) will never trigger it.

Another option: if the kwargs are different, there is an error telling the user to either use the same kwargs are before or to include all the previous metrics in their compute (quite a complicated error)

I am not familiar enough with the sorting analyzer workflow to have a useful opinion on this one.

I think it would be better if you had the parameters (the new and the old) in the warning string to give the user a better context.

Something to think about as well is whether this will be called in a context where the warning will make no sense at all for the user. Say, somebody wraps a sorting analyzer and keeps recomputing then the user gets a warning that they don't know anything about. For those cases, it is good to have an option to turn off a sure warning.

But I don't really know how this is used to judge if this last concern is valid or not.

Ooh yes, I like the idea of printing the old and new parameters!
(I'll add that tomorrow morning!)

alejoe91 · 2024-09-11T16:42:21Z

src/spikeinterface/postprocessing/template_metrics.py

+        delete_existing_metrics = self.params["delete_existing_metrics"]
+        metrics_to_compute = self.params["metrics_to_compute"]


@chrishalcrow one last comment. IMO the metrics_to_compute is not really a param, but rather a run-time thing. Would it make sense to move the logic here? As a user, seeing that metrics_to_compute parameter in the json file would make me feel weird!

Hello! You are logically correct, but run cannot receive any additional arguments, and params is the only way we can pass on the information (also for some technical reason about extensions being reloaded? @samuelgarcia )

So we can either repeat this code in run, or pass this information on using params

Yes, what I'm saying is since the metrics to compute matter when it's time to run, why not just move the code here to avoid having to save the metrics_to_compute parameter? We have access here to metric names and delete existing, so we have everything we need!

I had a go but it’s not so simple. We need to set the metric_names in set_params, since this is used for the metadata and in other places (like in get_data). So the params dict needs to know all the metric which are gonna be contained in the extension: everything that has been and will be computed.

So we have access to what’s already been computed, in the previous extensions params, and the complete list of metrics in self.params["metric_names"]. Using this, we can’t distinguish between:

a metric which the user included in metric_names and was already computed. We want to recompute this because the user has explicitly specified it.

a metric which the user has not specified in metric_names and was already computed. We don’t want to recompute this because it was already computed.

So we can either

Change the behavior so that we don’t re-compute metrics that the user specifies and exists already (I think this is bad: the user might be re-comuting because a metric was updated/fixed, and this wouldn’t overwrite the old result, even when explicitly included by the user)

Give metrics_to_compute a better name in params. Something like a catchier version of “metrics_computed_on_most_recent_run`?

Keep it as is.

Be cleverer than I've been so far???

I see! Makes total sense. Let's leave it as is then and thanks for explaining.

From my comment I didn't consider the option that the user is passing a metric name that is already computed in order to recompute...

src/spikeinterface/qualitymetrics/quality_metric_calculator.py

samuelgarcia · 2024-09-13T10:20:59Z

src/spikeinterface/qualitymetrics/quality_metric_list.py

+    "nn_noise_overlap": ["nn_noise_overlap"],
+    "silhouette": ["silhouette"],
+    "silhouette_full": ["silhouette_full"],
+}


We should maybe have a class approch one day...

Co-authored-by: Garcia Samuel <[email protected]>

samuelgarcia · 2024-09-13T10:39:44Z

Merci Chris!

chrishalcrow added 2 commits August 7, 2024 16:39

Do not delete quality metrics on recompute

a9a511a

Change where pandas is imported

63713db

zm711 reviewed Aug 7, 2024

View reviewed changes

src/spikeinterface/qualitymetrics/quality_metric_calculator.py Outdated Show resolved Hide resolved

chrishalcrow added 2 commits August 8, 2024 09:14

Replace try/except with some ifs for metric names in params

01abc84

Fix problem with loading sorting_analyzer with qms

e6b3941

zm711 reviewed Aug 8, 2024

View reviewed changes

src/spikeinterface/qualitymetrics/quality_metric_calculator.py Outdated Show resolved Hide resolved

update if statement

762e8fa

alejoe91 added the qualitymetrics Related to qualitymetrics module label Aug 8, 2024

alejoe91 added this to the 0.101.1 milestone Aug 27, 2024

chrishalcrow added 2 commits September 6, 2024 10:36

Merge branch 'main' into quality_metrics_update

caba52a

Move bulk of calc from init to set_params and run

6dcf0b4

chrishalcrow marked this pull request as ready for review September 6, 2024 15:40

chrishalcrow added 2 commits September 7, 2024 12:38

Add template_metrics

4ad6a88

Tests now pass

3360022

chrishalcrow added 2 commits September 9, 2024 08:20

tests definitely 100% pass now

3cc1298

Merge branch 'main' into quality_metrics_update

5074a4c

JoeZiminski reviewed Sep 10, 2024

View reviewed changes

src/spikeinterface/postprocessing/template_metrics.py Outdated Show resolved Hide resolved