From 8b7ac1e2f6af923416acc67dd5bd016d87a80698 Mon Sep 17 00:00:00 2001 From: chrishalcrow <57948917+chrishalcrow@users.noreply.github.com> Date: Mon, 24 Jun 2024 17:24:02 +0100 Subject: [PATCH 1/7] Unify compute_isi_violation docs and add UMS citation --- doc/modules/qualitymetrics/isi_violations.rst | 47 +++++++++---------- doc/references.rst | 5 +- 2 files changed, 27 insertions(+), 25 deletions(-) diff --git a/doc/modules/qualitymetrics/isi_violations.rst b/doc/modules/qualitymetrics/isi_violations.rst index e30a2334d5..dc347c614f 100644 --- a/doc/modules/qualitymetrics/isi_violations.rst +++ b/doc/modules/qualitymetrics/isi_violations.rst @@ -10,40 +10,37 @@ Calculation Neurons have a refractory period after a spiking event during which they cannot fire again. Inter-spike-interval (ISI) violations refers to the rate of refractory period violations (as described by [Hill]_). +We aim to calculate the contamination rate :math:`C`, measuring the ratio of isi violations in the spike-train of a unit. The calculation works under the assumption that the contaminant events happen randomly or come from another neuron that is not correlated with our unit. A correlation will lead to an overestimation of the contamination, whereas an anti-correlation will lead to an underestimation. -Different formulas have been developed over the years. +Different formulas have been developed, but all require: -Calculation from the [Hill]_ paper ----------------------------------- +- :math:`T` the duration of the recording. +- :math:`N` the number of spikes in the unit's spike train. +- :math:`t_r` the duration of the unit's refractory period. +- :math:`n_v` the number of violations of the refractory period. -The following quantities are required: +Calculation from the [UMS]_ package +----------------------------------- -- :math:`ISI_t` : biological threshold for ISI violation. -- :math:`ISI_{min}`: minimum ISI threshold enforced by the data recording system used. -- :math:`ISI_s` : the array of ISI violations which are observed in the unit's spike train. -- :math:`\#`: denotes count. +Here, the refactory period :math:`t_r` is adjusted to take account of the data recording system's minimum possible refactory +period. E.g. if a system has a sampling rate of :math:`f \text{ Hz}`, the closest that two spikes from the same unit can possibly +be is :math:`1/f \, \text{s}`. Hence the refactory period :math:`t_r` is the expected biological threshold minus this minimum possible +threshold. -The threshold for ISI violations is the biological ISI threshold, :math:`ISI_t`, minus the minimum ISI threshold, :math:`ISI_{min}` enforced by the data recording system used. -The array of inter-spike-intervals observed in the unit's spike train, :math:`ISI_s`, is used to identify the count (:math:`\#`) of observed ISI's below this threshold. -For a recording with a duration of :math:`T_r` seconds, and a unit with :math:`N_s` spikes, the rate of ISI violations is: +The contamination rate is calculated to be .. math:: - \textrm{ISI violations} = \frac{ \#( ISI_s < ISI_t) T_r }{ 2 N_s^2 (ISI_t - ISI_{min}) } + C = \frac{ n_v T }{ 2 N^2 t_r } Calculation from the [Llobet]_ paper ------------------------------------ -The following quantities are required: - -- :math:`T` the duration of the recording. -- :math:`N` the number of spikes in the unit's spike train. -- :math:`t_r` the duration of the unit's refractory period. -- :math:`n_v` the number of violations of the refractory period. - -The estimated contamination :math:`C` can be calculated with 2 extreme scenarios. In the first one, the contaminant spikes are completely random (or come from an infinite number of other neurons). In the second one, the contaminant spikes come from a single other neuron: +The estimated contamination :math:`C` is calculated in 2 extreme scenarios. In the first, the contaminant spikes + are completely random (or come from an infinite number of other neurons). In the second, the contaminant spikes + come from a single other neuron. In these scenarios, the contamination rate is .. math:: @@ -58,7 +55,9 @@ Expectation and use ------------------- ISI violations identifies unit contamination - a high value indicates a highly contaminated unit. -Despite being a ratio, ISI violations can exceed 1 (or become a complex number in the [Llobet]_ formula). This is usually due to the contaminant events being correlated with our neuron, and their number is greater than a purely random spike train. +Despite being a ratio, the contamination can exceed 1 (or become a complex number in the [Llobet]_ formula). +This is usually due to the contaminant events being correlated with our neuron, and their number is +greater than a purely random spike train. Example code ------------ @@ -86,8 +85,8 @@ With SpikeInterface: References ---------- -Hill implementation (:code:`isi_violation`) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +UMS implementation (:code:`isi_violation`) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. autofunction:: spikeinterface.qualitymetrics.misc_metrics.compute_isi_violations @@ -160,5 +159,5 @@ Links to original implementations Literature ---------- -Introduced by [Hill]_ (2011). +Introduced in UltraMegaSort2000 [UMS]_ (2011). Also described by [Llobet]_ (2022) diff --git a/doc/references.rst b/doc/references.rst index ace51db951..48b7cd44f6 100644 --- a/doc/references.rst +++ b/doc/references.rst @@ -50,9 +50,10 @@ If you use the :code:`qualitymetrics` module, i.e. you use the :code:`analyzer.c or :code:`compute_quality_metrics()` methods, please include the citations for the :code:`metric_names` that were particularly important for your research: -- :code:`amplitude_cutoff` or :code:`isi_violation` [Hill]_ +- :code:`amplitude_cutoff` [Hill]_ - :code:`amplitude_median` or :code:`sliding_rp_violation` [IBL]_ - :code:`drift` [Siegle]_ +- :code:`isi_violation` [UMS]_ - :code:`rp_violation` [Llobet]_ - :code:`sd_ratio` [Pouzat]_ - :code:`snr` [Lemon]_ [Jackson]_ @@ -122,6 +123,8 @@ References .. [Siegle] `Survey of Spiking in the Mouse Visual System Reveals Functional Hierarchy. 2021. `_ +.. [UMS] `UltraMegaSort2000 - Spike sorting and quality metrics for extracellular spike data. 2011. `_ + .. [Varol] `Decentralized Motion Inference and Registration of Neuropixel Data. 2021. `_ .. [Windolf] `Robust Online Multiband Drift Estimation in Electrophysiology Data. 2022. `_ From 364193cc32df16301bf5a902f2048033b2aef951 Mon Sep 17 00:00:00 2001 From: chrishalcrow <57948917+chrishalcrow@users.noreply.github.com> Date: Mon, 24 Jun 2024 17:42:46 +0100 Subject: [PATCH 2/7] Update compete_isi_violations docstring --- .../qualitymetrics/misc_metrics.py | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py index cbb55aeb8b..b3cf84cf9c 100644 --- a/src/spikeinterface/qualitymetrics/misc_metrics.py +++ b/src/spikeinterface/qualitymetrics/misc_metrics.py @@ -241,7 +241,7 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0, It computes several metrics related to isi violations: * isi_violations_ratio: the relative firing rate of the hypothetical neurons that are - generating the ISI violations. Described in [Hill]_. See Notes. + generating the ISI violations. See Notes. * isi_violation_count: number of ISI violations Parameters @@ -261,22 +261,23 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0, Returns ------- isi_violations_ratio : dict - The isi violation ratio described in [Hill]_. + The isi violation ratio. isi_violation_count : dict Number of violations. Notes ----- - You can interpret an ISI violations ratio value of 0.5 as meaning that contaminating spikes are - occurring at roughly half the rate of "true" spikes for that unit. - In cases of highly contaminated units, the ISI violations ratio can sometimes be greater than 1. + The returned ISI violations ratio measures the approximate fraction of spikes in each + unit which are contaminted. This interpretation is good when the ratio is small, and + becomes worse as it grows. In cases of highly contaminated units, the ISI violations + ratio can sometimes be greater than 1. References ---------- - Based on metrics described in [Hill]_ + Based on metrics originally implemented in [UMS]_ - Originally written in Matlab by Nick Steinmetz (https://github.com/cortex-lab/sortingQuality) - and converted to Python by Daniel Denman. + This implementation is based on one written in Matlab by Nick Steinmetz + (https://github.com/cortex-lab/sortingQuality) and converted to Python by Daniel Denman. """ res = namedtuple("isi_violation", ["isi_violations_ratio", "isi_violations_count"]) @@ -324,7 +325,7 @@ def compute_refrac_period_violations( Calculate the number of refractory period violations. This is similar (but slightly different) to the ISI violations. - The key difference being that the violations are not only computed on consecutive spikes. + The key differences being that the violations are not only computed on consecutive spikes. This is required for some formulas (e.g. the ones from Llobet & Wyngaard 2022). From 0108227c911d7a8e47bd445de6e5b1f1b27113f2 Mon Sep 17 00:00:00 2001 From: chrishalcrow <57948917+chrishalcrow@users.noreply.github.com> Date: Fri, 28 Jun 2024 13:47:29 +0100 Subject: [PATCH 3/7] Respond to review --- doc/modules/qualitymetrics/isi_violations.rst | 10 ++++++---- src/spikeinterface/qualitymetrics/misc_metrics.py | 14 ++++++++++++-- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/doc/modules/qualitymetrics/isi_violations.rst b/doc/modules/qualitymetrics/isi_violations.rst index dc347c614f..10e1934537 100644 --- a/doc/modules/qualitymetrics/isi_violations.rst +++ b/doc/modules/qualitymetrics/isi_violations.rst @@ -16,14 +16,16 @@ A correlation will lead to an overestimation of the contamination, whereas an an Different formulas have been developed, but all require: -- :math:`T` the duration of the recording. +- :math:`T` the duration of the recording in seconds. - :math:`N` the number of spikes in the unit's spike train. -- :math:`t_r` the duration of the unit's refractory period. +- :math:`t_r` the duration of the unit's refractory period in seconds. - :math:`n_v` the number of violations of the refractory period. Calculation from the [UMS]_ package ----------------------------------- +Originally implemented in the `rpv_contamination` calculation of the UltraMegaSort2000 package ``_. + Here, the refactory period :math:`t_r` is adjusted to take account of the data recording system's minimum possible refactory period. E.g. if a system has a sampling rate of :math:`f \text{ Hz}`, the closest that two spikes from the same unit can possibly be is :math:`1/f \, \text{s}`. Hence the refactory period :math:`t_r` is the expected biological threshold minus this minimum possible @@ -39,8 +41,8 @@ Calculation from the [Llobet]_ paper ------------------------------------ The estimated contamination :math:`C` is calculated in 2 extreme scenarios. In the first, the contaminant spikes - are completely random (or come from an infinite number of other neurons). In the second, the contaminant spikes - come from a single other neuron. In these scenarios, the contamination rate is +are completely random (or come from an infinite number of other neurons). In the second, the contaminant spikes +come from a single other neuron. In these scenarios, the contamination rate is .. math:: diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py index b3cf84cf9c..f809ad11ec 100644 --- a/src/spikeinterface/qualitymetrics/misc_metrics.py +++ b/src/spikeinterface/qualitymetrics/misc_metrics.py @@ -272,9 +272,14 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0, becomes worse as it grows. In cases of highly contaminated units, the ISI violations ratio can sometimes be greater than 1. + This method counts the number of spikes whose isi is violated. If there are three + spikes within `isi_threshold_ms`, the first and second are violated. Hence there are two + spikes which have been violated. This is is contrast to `compute_refrac_period_violations`, + which counts the number of violations. + References ---------- - Based on metrics originally implemented in [UMS]_ + Based on metrics originally implemented in Ultra Mega Sort [UMS]_. This implementation is based on one written in Matlab by Nick Steinmetz (https://github.com/cortex-lab/sortingQuality) and converted to Python by Daniel Denman. @@ -325,7 +330,6 @@ def compute_refrac_period_violations( Calculate the number of refractory period violations. This is similar (but slightly different) to the ISI violations. - The key differences being that the violations are not only computed on consecutive spikes. This is required for some formulas (e.g. the ones from Llobet & Wyngaard 2022). @@ -352,6 +356,12 @@ def compute_refrac_period_violations( ----- Requires "numba" package + This method counts the number of violations which occur during the refactory period. + If there are three spikes within `refractory_period_ms`, the second and third spikes + violate the first spikes and the third spike violates the second spike. Hence there + are three violations. This is in contrast to `compute_isi_violations`, which + computes the number of spikes which have been violated. + References ---------- Based on metrics described in [Llobet]_ From c590ed03c83cd8cc81dfe9b8c2b3a8964c0f6345 Mon Sep 17 00:00:00 2001 From: chrishalcrow <57948917+chrishalcrow@users.noreply.github.com> Date: Fri, 28 Jun 2024 13:52:16 +0100 Subject: [PATCH 4/7] add a colon --- doc/modules/qualitymetrics/isi_violations.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/modules/qualitymetrics/isi_violations.rst b/doc/modules/qualitymetrics/isi_violations.rst index 10e1934537..2002ec6582 100644 --- a/doc/modules/qualitymetrics/isi_violations.rst +++ b/doc/modules/qualitymetrics/isi_violations.rst @@ -24,7 +24,7 @@ Different formulas have been developed, but all require: Calculation from the [UMS]_ package ----------------------------------- -Originally implemented in the `rpv_contamination` calculation of the UltraMegaSort2000 package ``_. +Originally implemented in the `rpv_contamination` calculation of the UltraMegaSort2000 package: ``_. Here, the refactory period :math:`t_r` is adjusted to take account of the data recording system's minimum possible refactory period. E.g. if a system has a sampling rate of :math:`f \text{ Hz}`, the closest that two spikes from the same unit can possibly From 4e543010a7ff58dbc8ffa1c4f0aa5e2e7867725c Mon Sep 17 00:00:00 2001 From: Chris Halcrow <57948917+chrishalcrow@users.noreply.github.com> Date: Mon, 1 Jul 2024 09:04:19 +0200 Subject: [PATCH 5/7] Apply suggestions from code review Co-authored-by: Joe Ziminski <55797454+JoeZiminski@users.noreply.github.com> --- src/spikeinterface/qualitymetrics/misc_metrics.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py index f809ad11ec..c2dff554e3 100644 --- a/src/spikeinterface/qualitymetrics/misc_metrics.py +++ b/src/spikeinterface/qualitymetrics/misc_metrics.py @@ -357,8 +357,8 @@ def compute_refrac_period_violations( Requires "numba" package This method counts the number of violations which occur during the refactory period. - If there are three spikes within `refractory_period_ms`, the second and third spikes - violate the first spikes and the third spike violates the second spike. Hence there + For example, if there are three spikes within `refractory_period_ms`, the second and third spikes + violate the first spike and the third spike violates the second spike. Hence there are three violations. This is in contrast to `compute_isi_violations`, which computes the number of spikes which have been violated. From 2bd0633071e3fd34c61b14554f6c4c9b8d1d0733 Mon Sep 17 00:00:00 2001 From: Chris Halcrow <57948917+chrishalcrow@users.noreply.github.com> Date: Mon, 1 Jul 2024 14:42:03 +0100 Subject: [PATCH 6/7] Update src/spikeinterface/qualitymetrics/misc_metrics.py Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com> --- src/spikeinterface/qualitymetrics/misc_metrics.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py index c2dff554e3..9e4685b5bd 100644 --- a/src/spikeinterface/qualitymetrics/misc_metrics.py +++ b/src/spikeinterface/qualitymetrics/misc_metrics.py @@ -281,7 +281,7 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0, ---------- Based on metrics originally implemented in Ultra Mega Sort [UMS]_. - This implementation is based on one written in Matlab by Nick Steinmetz + This implementation is based on one of the original implementations written in Matlab by Nick Steinmetz (https://github.com/cortex-lab/sortingQuality) and converted to Python by Daniel Denman. """ res = namedtuple("isi_violation", ["isi_violations_ratio", "isi_violations_count"]) From d1ea2158dc8bd2de5f2d69b69ed0c1b0ed8adc87 Mon Sep 17 00:00:00 2001 From: chrishalcrow <57948917+chrishalcrow@users.noreply.github.com> Date: Tue, 2 Jul 2024 09:50:34 +0100 Subject: [PATCH 7/7] Respond to review --- doc/modules/qualitymetrics/isi_violations.rst | 9 ++++++--- doc/references.rst | 3 ++- src/spikeinterface/qualitymetrics/misc_metrics.py | 9 +++++---- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/doc/modules/qualitymetrics/isi_violations.rst b/doc/modules/qualitymetrics/isi_violations.rst index 2002ec6582..4527cdffe9 100644 --- a/doc/modules/qualitymetrics/isi_violations.rst +++ b/doc/modules/qualitymetrics/isi_violations.rst @@ -19,13 +19,14 @@ Different formulas have been developed, but all require: - :math:`T` the duration of the recording in seconds. - :math:`N` the number of spikes in the unit's spike train. - :math:`t_r` the duration of the unit's refractory period in seconds. -- :math:`n_v` the number of violations of the refractory period. Calculation from the [UMS]_ package ----------------------------------- Originally implemented in the `rpv_contamination` calculation of the UltraMegaSort2000 package: ``_. +In this method the number of spikes whose refractory period are violated, denoted :math:`n_v`, is used. + Here, the refactory period :math:`t_r` is adjusted to take account of the data recording system's minimum possible refactory period. E.g. if a system has a sampling rate of :math:`f \text{ Hz}`, the closest that two spikes from the same unit can possibly be is :math:`1/f \, \text{s}`. Hence the refactory period :math:`t_r` is the expected biological threshold minus this minimum possible @@ -40,6 +41,8 @@ The contamination rate is calculated to be Calculation from the [Llobet]_ paper ------------------------------------ +In this method the number spikes which violate other spikes' refractory periods, denoted :math:`\tilde{n}_v`, is used. + The estimated contamination :math:`C` is calculated in 2 extreme scenarios. In the first, the contaminant spikes are completely random (or come from an infinite number of other neurons). In the second, the contaminant spikes come from a single other neuron. In these scenarios, the contamination rate is @@ -47,8 +50,8 @@ come from a single other neuron. In these scenarios, the contamination rate is .. math:: C = \frac{FP}{TP + FP} \approx \begin{cases} - 1 - \sqrt{1 - \frac{n_v T}{N^2 t_r}} \text{ for the case of random contamination} \\ - \frac{1}{2} \left( 1 - \sqrt{1 - \frac{2 n_v T}{N^2 t_r}} \right) \text{ for the case of 1 contaminant neuron} + 1 - \sqrt{1 - \frac{\tilde{n}_v T}{N^2 t_r}} \text{ for the case of random contamination} \\ + \frac{1}{2} \left( 1 - \sqrt{1 - \frac{2 \tilde{n}_v T}{N^2 t_r}} \right) \text{ for the case of 1 contaminant neuron} \end{cases} Where :math:`TP` is the number of true positives (detected spikes that come from the neuron) and :math:`FP` is the number of false positives (detected spikes that don't come from the neuron). diff --git a/doc/references.rst b/doc/references.rst index 48b7cd44f6..5fbcbecb63 100644 --- a/doc/references.rst +++ b/doc/references.rst @@ -51,7 +51,8 @@ or :code:`compute_quality_metrics()` methods, please include the citations for t important for your research: - :code:`amplitude_cutoff` [Hill]_ -- :code:`amplitude_median` or :code:`sliding_rp_violation` [IBL]_ +- :code:`amplitude_median` [IBL]_ +- :code:`sliding_rp_violation` [IBL]_ - :code:`drift` [Siegle]_ - :code:`isi_violation` [UMS]_ - :code:`rp_violation` [Llobet]_ diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py index 9e4685b5bd..31a3efb2b6 100644 --- a/src/spikeinterface/qualitymetrics/misc_metrics.py +++ b/src/spikeinterface/qualitymetrics/misc_metrics.py @@ -267,10 +267,11 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0, Notes ----- - The returned ISI violations ratio measures the approximate fraction of spikes in each - unit which are contaminted. This interpretation is good when the ratio is small, and - becomes worse as it grows. In cases of highly contaminated units, the ISI violations - ratio can sometimes be greater than 1. + The returned ISI violations ratio approximates the fraction of spikes in each + unit which are contaminted. The formulation assumes that the contaminating spikes + are statistically independent from the other spikes in that cluster. This + approximation can break down in reality, especially for highly contaminated units. + See the discussion in Section 4.1 of [Llobet]_ for more details. This method counts the number of spikes whose isi is violated. If there are three spikes within `isi_threshold_ms`, the first and second are violated. Hence there are two