From 8b7ac1e2f6af923416acc67dd5bd016d87a80698 Mon Sep 17 00:00:00 2001
From: chrishalcrow <57948917+chrishalcrow@users.noreply.github.com>
Date: Mon, 24 Jun 2024 17:24:02 +0100
Subject: [PATCH 1/7] Unify compute_isi_violation docs and add UMS citation
---
doc/modules/qualitymetrics/isi_violations.rst | 47 +++++++++----------
doc/references.rst | 5 +-
2 files changed, 27 insertions(+), 25 deletions(-)
diff --git a/doc/modules/qualitymetrics/isi_violations.rst b/doc/modules/qualitymetrics/isi_violations.rst
index e30a2334d5..dc347c614f 100644
--- a/doc/modules/qualitymetrics/isi_violations.rst
+++ b/doc/modules/qualitymetrics/isi_violations.rst
@@ -10,40 +10,37 @@ Calculation
Neurons have a refractory period after a spiking event during which they cannot fire again.
Inter-spike-interval (ISI) violations refers to the rate of refractory period violations (as described by [Hill]_).
+We aim to calculate the contamination rate :math:`C`, measuring the ratio of isi violations in the spike-train of a unit.
The calculation works under the assumption that the contaminant events happen randomly or come from another neuron that is not correlated with our unit.
A correlation will lead to an overestimation of the contamination, whereas an anti-correlation will lead to an underestimation.
-Different formulas have been developed over the years.
+Different formulas have been developed, but all require:
-Calculation from the [Hill]_ paper
-----------------------------------
+- :math:`T` the duration of the recording.
+- :math:`N` the number of spikes in the unit's spike train.
+- :math:`t_r` the duration of the unit's refractory period.
+- :math:`n_v` the number of violations of the refractory period.
-The following quantities are required:
+Calculation from the [UMS]_ package
+-----------------------------------
-- :math:`ISI_t` : biological threshold for ISI violation.
-- :math:`ISI_{min}`: minimum ISI threshold enforced by the data recording system used.
-- :math:`ISI_s` : the array of ISI violations which are observed in the unit's spike train.
-- :math:`\#`: denotes count.
+Here, the refactory period :math:`t_r` is adjusted to take account of the data recording system's minimum possible refactory
+period. E.g. if a system has a sampling rate of :math:`f \text{ Hz}`, the closest that two spikes from the same unit can possibly
+be is :math:`1/f \, \text{s}`. Hence the refactory period :math:`t_r` is the expected biological threshold minus this minimum possible
+threshold.
-The threshold for ISI violations is the biological ISI threshold, :math:`ISI_t`, minus the minimum ISI threshold, :math:`ISI_{min}` enforced by the data recording system used.
-The array of inter-spike-intervals observed in the unit's spike train, :math:`ISI_s`, is used to identify the count (:math:`\#`) of observed ISI's below this threshold.
-For a recording with a duration of :math:`T_r` seconds, and a unit with :math:`N_s` spikes, the rate of ISI violations is:
+The contamination rate is calculated to be
.. math::
- \textrm{ISI violations} = \frac{ \#( ISI_s < ISI_t) T_r }{ 2 N_s^2 (ISI_t - ISI_{min}) }
+ C = \frac{ n_v T }{ 2 N^2 t_r }
Calculation from the [Llobet]_ paper
------------------------------------
-The following quantities are required:
-
-- :math:`T` the duration of the recording.
-- :math:`N` the number of spikes in the unit's spike train.
-- :math:`t_r` the duration of the unit's refractory period.
-- :math:`n_v` the number of violations of the refractory period.
-
-The estimated contamination :math:`C` can be calculated with 2 extreme scenarios. In the first one, the contaminant spikes are completely random (or come from an infinite number of other neurons). In the second one, the contaminant spikes come from a single other neuron:
+The estimated contamination :math:`C` is calculated in 2 extreme scenarios. In the first, the contaminant spikes
+ are completely random (or come from an infinite number of other neurons). In the second, the contaminant spikes
+ come from a single other neuron. In these scenarios, the contamination rate is
.. math::
@@ -58,7 +55,9 @@ Expectation and use
-------------------
ISI violations identifies unit contamination - a high value indicates a highly contaminated unit.
-Despite being a ratio, ISI violations can exceed 1 (or become a complex number in the [Llobet]_ formula). This is usually due to the contaminant events being correlated with our neuron, and their number is greater than a purely random spike train.
+Despite being a ratio, the contamination can exceed 1 (or become a complex number in the [Llobet]_ formula).
+This is usually due to the contaminant events being correlated with our neuron, and their number is
+greater than a purely random spike train.
Example code
------------
@@ -86,8 +85,8 @@ With SpikeInterface:
References
----------
-Hill implementation (:code:`isi_violation`)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+UMS implementation (:code:`isi_violation`)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: spikeinterface.qualitymetrics.misc_metrics.compute_isi_violations
@@ -160,5 +159,5 @@ Links to original implementations
Literature
----------
-Introduced by [Hill]_ (2011).
+Introduced in UltraMegaSort2000 [UMS]_ (2011).
Also described by [Llobet]_ (2022)
diff --git a/doc/references.rst b/doc/references.rst
index ace51db951..48b7cd44f6 100644
--- a/doc/references.rst
+++ b/doc/references.rst
@@ -50,9 +50,10 @@ If you use the :code:`qualitymetrics` module, i.e. you use the :code:`analyzer.c
or :code:`compute_quality_metrics()` methods, please include the citations for the :code:`metric_names` that were particularly
important for your research:
-- :code:`amplitude_cutoff` or :code:`isi_violation` [Hill]_
+- :code:`amplitude_cutoff` [Hill]_
- :code:`amplitude_median` or :code:`sliding_rp_violation` [IBL]_
- :code:`drift` [Siegle]_
+- :code:`isi_violation` [UMS]_
- :code:`rp_violation` [Llobet]_
- :code:`sd_ratio` [Pouzat]_
- :code:`snr` [Lemon]_ [Jackson]_
@@ -122,6 +123,8 @@ References
.. [Siegle] `Survey of Spiking in the Mouse Visual System Reveals Functional Hierarchy. 2021. `_
+.. [UMS] `UltraMegaSort2000 - Spike sorting and quality metrics for extracellular spike data. 2011. `_
+
.. [Varol] `Decentralized Motion Inference and Registration of Neuropixel Data. 2021. `_
.. [Windolf] `Robust Online Multiband Drift Estimation in Electrophysiology Data. 2022. `_
From 364193cc32df16301bf5a902f2048033b2aef951 Mon Sep 17 00:00:00 2001
From: chrishalcrow <57948917+chrishalcrow@users.noreply.github.com>
Date: Mon, 24 Jun 2024 17:42:46 +0100
Subject: [PATCH 2/7] Update compete_isi_violations docstring
---
.../qualitymetrics/misc_metrics.py | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py
index cbb55aeb8b..b3cf84cf9c 100644
--- a/src/spikeinterface/qualitymetrics/misc_metrics.py
+++ b/src/spikeinterface/qualitymetrics/misc_metrics.py
@@ -241,7 +241,7 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0,
It computes several metrics related to isi violations:
* isi_violations_ratio: the relative firing rate of the hypothetical neurons that are
- generating the ISI violations. Described in [Hill]_. See Notes.
+ generating the ISI violations. See Notes.
* isi_violation_count: number of ISI violations
Parameters
@@ -261,22 +261,23 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0,
Returns
-------
isi_violations_ratio : dict
- The isi violation ratio described in [Hill]_.
+ The isi violation ratio.
isi_violation_count : dict
Number of violations.
Notes
-----
- You can interpret an ISI violations ratio value of 0.5 as meaning that contaminating spikes are
- occurring at roughly half the rate of "true" spikes for that unit.
- In cases of highly contaminated units, the ISI violations ratio can sometimes be greater than 1.
+ The returned ISI violations ratio measures the approximate fraction of spikes in each
+ unit which are contaminted. This interpretation is good when the ratio is small, and
+ becomes worse as it grows. In cases of highly contaminated units, the ISI violations
+ ratio can sometimes be greater than 1.
References
----------
- Based on metrics described in [Hill]_
+ Based on metrics originally implemented in [UMS]_
- Originally written in Matlab by Nick Steinmetz (https://github.com/cortex-lab/sortingQuality)
- and converted to Python by Daniel Denman.
+ This implementation is based on one written in Matlab by Nick Steinmetz
+ (https://github.com/cortex-lab/sortingQuality) and converted to Python by Daniel Denman.
"""
res = namedtuple("isi_violation", ["isi_violations_ratio", "isi_violations_count"])
@@ -324,7 +325,7 @@ def compute_refrac_period_violations(
Calculate the number of refractory period violations.
This is similar (but slightly different) to the ISI violations.
- The key difference being that the violations are not only computed on consecutive spikes.
+ The key differences being that the violations are not only computed on consecutive spikes.
This is required for some formulas (e.g. the ones from Llobet & Wyngaard 2022).
From 0108227c911d7a8e47bd445de6e5b1f1b27113f2 Mon Sep 17 00:00:00 2001
From: chrishalcrow <57948917+chrishalcrow@users.noreply.github.com>
Date: Fri, 28 Jun 2024 13:47:29 +0100
Subject: [PATCH 3/7] Respond to review
---
doc/modules/qualitymetrics/isi_violations.rst | 10 ++++++----
src/spikeinterface/qualitymetrics/misc_metrics.py | 14 ++++++++++++--
2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/doc/modules/qualitymetrics/isi_violations.rst b/doc/modules/qualitymetrics/isi_violations.rst
index dc347c614f..10e1934537 100644
--- a/doc/modules/qualitymetrics/isi_violations.rst
+++ b/doc/modules/qualitymetrics/isi_violations.rst
@@ -16,14 +16,16 @@ A correlation will lead to an overestimation of the contamination, whereas an an
Different formulas have been developed, but all require:
-- :math:`T` the duration of the recording.
+- :math:`T` the duration of the recording in seconds.
- :math:`N` the number of spikes in the unit's spike train.
-- :math:`t_r` the duration of the unit's refractory period.
+- :math:`t_r` the duration of the unit's refractory period in seconds.
- :math:`n_v` the number of violations of the refractory period.
Calculation from the [UMS]_ package
-----------------------------------
+Originally implemented in the `rpv_contamination` calculation of the UltraMegaSort2000 package ``_.
+
Here, the refactory period :math:`t_r` is adjusted to take account of the data recording system's minimum possible refactory
period. E.g. if a system has a sampling rate of :math:`f \text{ Hz}`, the closest that two spikes from the same unit can possibly
be is :math:`1/f \, \text{s}`. Hence the refactory period :math:`t_r` is the expected biological threshold minus this minimum possible
@@ -39,8 +41,8 @@ Calculation from the [Llobet]_ paper
------------------------------------
The estimated contamination :math:`C` is calculated in 2 extreme scenarios. In the first, the contaminant spikes
- are completely random (or come from an infinite number of other neurons). In the second, the contaminant spikes
- come from a single other neuron. In these scenarios, the contamination rate is
+are completely random (or come from an infinite number of other neurons). In the second, the contaminant spikes
+come from a single other neuron. In these scenarios, the contamination rate is
.. math::
diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py
index b3cf84cf9c..f809ad11ec 100644
--- a/src/spikeinterface/qualitymetrics/misc_metrics.py
+++ b/src/spikeinterface/qualitymetrics/misc_metrics.py
@@ -272,9 +272,14 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0,
becomes worse as it grows. In cases of highly contaminated units, the ISI violations
ratio can sometimes be greater than 1.
+ This method counts the number of spikes whose isi is violated. If there are three
+ spikes within `isi_threshold_ms`, the first and second are violated. Hence there are two
+ spikes which have been violated. This is is contrast to `compute_refrac_period_violations`,
+ which counts the number of violations.
+
References
----------
- Based on metrics originally implemented in [UMS]_
+ Based on metrics originally implemented in Ultra Mega Sort [UMS]_.
This implementation is based on one written in Matlab by Nick Steinmetz
(https://github.com/cortex-lab/sortingQuality) and converted to Python by Daniel Denman.
@@ -325,7 +330,6 @@ def compute_refrac_period_violations(
Calculate the number of refractory period violations.
This is similar (but slightly different) to the ISI violations.
- The key differences being that the violations are not only computed on consecutive spikes.
This is required for some formulas (e.g. the ones from Llobet & Wyngaard 2022).
@@ -352,6 +356,12 @@ def compute_refrac_period_violations(
-----
Requires "numba" package
+ This method counts the number of violations which occur during the refactory period.
+ If there are three spikes within `refractory_period_ms`, the second and third spikes
+ violate the first spikes and the third spike violates the second spike. Hence there
+ are three violations. This is in contrast to `compute_isi_violations`, which
+ computes the number of spikes which have been violated.
+
References
----------
Based on metrics described in [Llobet]_
From c590ed03c83cd8cc81dfe9b8c2b3a8964c0f6345 Mon Sep 17 00:00:00 2001
From: chrishalcrow <57948917+chrishalcrow@users.noreply.github.com>
Date: Fri, 28 Jun 2024 13:52:16 +0100
Subject: [PATCH 4/7] add a colon
---
doc/modules/qualitymetrics/isi_violations.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/modules/qualitymetrics/isi_violations.rst b/doc/modules/qualitymetrics/isi_violations.rst
index 10e1934537..2002ec6582 100644
--- a/doc/modules/qualitymetrics/isi_violations.rst
+++ b/doc/modules/qualitymetrics/isi_violations.rst
@@ -24,7 +24,7 @@ Different formulas have been developed, but all require:
Calculation from the [UMS]_ package
-----------------------------------
-Originally implemented in the `rpv_contamination` calculation of the UltraMegaSort2000 package ``_.
+Originally implemented in the `rpv_contamination` calculation of the UltraMegaSort2000 package: ``_.
Here, the refactory period :math:`t_r` is adjusted to take account of the data recording system's minimum possible refactory
period. E.g. if a system has a sampling rate of :math:`f \text{ Hz}`, the closest that two spikes from the same unit can possibly
From 4e543010a7ff58dbc8ffa1c4f0aa5e2e7867725c Mon Sep 17 00:00:00 2001
From: Chris Halcrow <57948917+chrishalcrow@users.noreply.github.com>
Date: Mon, 1 Jul 2024 09:04:19 +0200
Subject: [PATCH 5/7] Apply suggestions from code review
Co-authored-by: Joe Ziminski <55797454+JoeZiminski@users.noreply.github.com>
---
src/spikeinterface/qualitymetrics/misc_metrics.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py
index f809ad11ec..c2dff554e3 100644
--- a/src/spikeinterface/qualitymetrics/misc_metrics.py
+++ b/src/spikeinterface/qualitymetrics/misc_metrics.py
@@ -357,8 +357,8 @@ def compute_refrac_period_violations(
Requires "numba" package
This method counts the number of violations which occur during the refactory period.
- If there are three spikes within `refractory_period_ms`, the second and third spikes
- violate the first spikes and the third spike violates the second spike. Hence there
+ For example, if there are three spikes within `refractory_period_ms`, the second and third spikes
+ violate the first spike and the third spike violates the second spike. Hence there
are three violations. This is in contrast to `compute_isi_violations`, which
computes the number of spikes which have been violated.
From 2bd0633071e3fd34c61b14554f6c4c9b8d1d0733 Mon Sep 17 00:00:00 2001
From: Chris Halcrow <57948917+chrishalcrow@users.noreply.github.com>
Date: Mon, 1 Jul 2024 14:42:03 +0100
Subject: [PATCH 6/7] Update src/spikeinterface/qualitymetrics/misc_metrics.py
Co-authored-by: Zach McKenzie <92116279+zm711@users.noreply.github.com>
---
src/spikeinterface/qualitymetrics/misc_metrics.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py
index c2dff554e3..9e4685b5bd 100644
--- a/src/spikeinterface/qualitymetrics/misc_metrics.py
+++ b/src/spikeinterface/qualitymetrics/misc_metrics.py
@@ -281,7 +281,7 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0,
----------
Based on metrics originally implemented in Ultra Mega Sort [UMS]_.
- This implementation is based on one written in Matlab by Nick Steinmetz
+ This implementation is based on one of the original implementations written in Matlab by Nick Steinmetz
(https://github.com/cortex-lab/sortingQuality) and converted to Python by Daniel Denman.
"""
res = namedtuple("isi_violation", ["isi_violations_ratio", "isi_violations_count"])
From d1ea2158dc8bd2de5f2d69b69ed0c1b0ed8adc87 Mon Sep 17 00:00:00 2001
From: chrishalcrow <57948917+chrishalcrow@users.noreply.github.com>
Date: Tue, 2 Jul 2024 09:50:34 +0100
Subject: [PATCH 7/7] Respond to review
---
doc/modules/qualitymetrics/isi_violations.rst | 9 ++++++---
doc/references.rst | 3 ++-
src/spikeinterface/qualitymetrics/misc_metrics.py | 9 +++++----
3 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/modules/qualitymetrics/isi_violations.rst b/doc/modules/qualitymetrics/isi_violations.rst
index 2002ec6582..4527cdffe9 100644
--- a/doc/modules/qualitymetrics/isi_violations.rst
+++ b/doc/modules/qualitymetrics/isi_violations.rst
@@ -19,13 +19,14 @@ Different formulas have been developed, but all require:
- :math:`T` the duration of the recording in seconds.
- :math:`N` the number of spikes in the unit's spike train.
- :math:`t_r` the duration of the unit's refractory period in seconds.
-- :math:`n_v` the number of violations of the refractory period.
Calculation from the [UMS]_ package
-----------------------------------
Originally implemented in the `rpv_contamination` calculation of the UltraMegaSort2000 package: ``_.
+In this method the number of spikes whose refractory period are violated, denoted :math:`n_v`, is used.
+
Here, the refactory period :math:`t_r` is adjusted to take account of the data recording system's minimum possible refactory
period. E.g. if a system has a sampling rate of :math:`f \text{ Hz}`, the closest that two spikes from the same unit can possibly
be is :math:`1/f \, \text{s}`. Hence the refactory period :math:`t_r` is the expected biological threshold minus this minimum possible
@@ -40,6 +41,8 @@ The contamination rate is calculated to be
Calculation from the [Llobet]_ paper
------------------------------------
+In this method the number spikes which violate other spikes' refractory periods, denoted :math:`\tilde{n}_v`, is used.
+
The estimated contamination :math:`C` is calculated in 2 extreme scenarios. In the first, the contaminant spikes
are completely random (or come from an infinite number of other neurons). In the second, the contaminant spikes
come from a single other neuron. In these scenarios, the contamination rate is
@@ -47,8 +50,8 @@ come from a single other neuron. In these scenarios, the contamination rate is
.. math::
C = \frac{FP}{TP + FP} \approx \begin{cases}
- 1 - \sqrt{1 - \frac{n_v T}{N^2 t_r}} \text{ for the case of random contamination} \\
- \frac{1}{2} \left( 1 - \sqrt{1 - \frac{2 n_v T}{N^2 t_r}} \right) \text{ for the case of 1 contaminant neuron}
+ 1 - \sqrt{1 - \frac{\tilde{n}_v T}{N^2 t_r}} \text{ for the case of random contamination} \\
+ \frac{1}{2} \left( 1 - \sqrt{1 - \frac{2 \tilde{n}_v T}{N^2 t_r}} \right) \text{ for the case of 1 contaminant neuron}
\end{cases}
Where :math:`TP` is the number of true positives (detected spikes that come from the neuron) and :math:`FP` is the number of false positives (detected spikes that don't come from the neuron).
diff --git a/doc/references.rst b/doc/references.rst
index 48b7cd44f6..5fbcbecb63 100644
--- a/doc/references.rst
+++ b/doc/references.rst
@@ -51,7 +51,8 @@ or :code:`compute_quality_metrics()` methods, please include the citations for t
important for your research:
- :code:`amplitude_cutoff` [Hill]_
-- :code:`amplitude_median` or :code:`sliding_rp_violation` [IBL]_
+- :code:`amplitude_median` [IBL]_
+- :code:`sliding_rp_violation` [IBL]_
- :code:`drift` [Siegle]_
- :code:`isi_violation` [UMS]_
- :code:`rp_violation` [Llobet]_
diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py
index 9e4685b5bd..31a3efb2b6 100644
--- a/src/spikeinterface/qualitymetrics/misc_metrics.py
+++ b/src/spikeinterface/qualitymetrics/misc_metrics.py
@@ -267,10 +267,11 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0,
Notes
-----
- The returned ISI violations ratio measures the approximate fraction of spikes in each
- unit which are contaminted. This interpretation is good when the ratio is small, and
- becomes worse as it grows. In cases of highly contaminated units, the ISI violations
- ratio can sometimes be greater than 1.
+ The returned ISI violations ratio approximates the fraction of spikes in each
+ unit which are contaminted. The formulation assumes that the contaminating spikes
+ are statistically independent from the other spikes in that cluster. This
+ approximation can break down in reality, especially for highly contaminated units.
+ See the discussion in Section 4.1 of [Llobet]_ for more details.
This method counts the number of spikes whose isi is violated. If there are three
spikes within `isi_threshold_ms`, the first and second are violated. Hence there are two