Allow precomputing spike trains #2175

DradeAW · 2023-11-06T13:28:27Z

Allows for fast pre-computation of spike trains

for more information, see https://pre-commit.ci

…ce into fast_vector_to_dict

for more information, see https://pre-commit.ci

…ce into fast_vector_to_dict

for more information, see https://pre-commit.ci

DradeAW · 2023-11-06T14:26:49Z

Wow, to my surprise this is not faster!
I don't understand why, maybe I made a mistake somewhere ...

import time
import spikeinterface.core as si


t1 = time.perf_counter()

sorting = si.load_extractor("/mnt/raid0/data/MEArec/1h_3000cells/analyses/ks2_5_pj7-3/sorting")
for unit_id in sorting.unit_ids:
	sorting.get_unit_spike_train(unit_id)

t2 = time.perf_counter()

sorting = si.load_extractor("/mnt/raid0/data/MEArec/1h_3000cells/analyses/ks2_5_pj7-3/sorting")
sorting.precompute_spike_trains()
for unit_id in sorting.unit_ids:
	sorting.get_unit_spike_train(unit_id)

t3 = time.perf_counter()

print(f"Old way: {t2-t1:.1f} s")
print(f"New way: {t3-t2:.1f} s")

Old way: 16.5 s
New way: 18.3 s

…ce into fast_vector_to_dict

DradeAW · 2023-11-06T14:47:23Z

Ok this is way better with the latest commit!

import time
import numpy as np
import spikeinterface.core as si



sorting1 = si.load_extractor("/mnt/raid0/data/MEArec/1h_3000cells/analyses/ks2_5_pj7-3/sorting")
sorting2 = si.load_extractor("/mnt/raid0/data/MEArec/1h_3000cells/analyses/ks2_5_pj7-3/sorting")

t1 = time.perf_counter()

for unit_id in sorting1.unit_ids:
	sorting1.get_unit_spike_train(unit_id)

t2 = time.perf_counter()

sorting2.precompute_spike_trains()
for unit_id in sorting2.unit_ids:
	sorting2.get_unit_spike_train(unit_id)

t3 = time.perf_counter()

print(f"Old way: {t2-t1:.1f} s")
print(f"New way: {t3-t2:.1f} s")

for unit_id in sorting1.unit_ids:
	assert np.all(sorting1.get_unit_spike_train(unit_id) == sorting2.get_unit_spike_train(unit_id))

Old way: 16.3 s
New way: 1.4 s

DradeAW · 2023-11-06T14:53:06Z

For reference, my sorting dataset contains 531 units for a total of 11,298,565 spikes (1h recording)

zm711

Just a question and suggestion @DradeAW

src/spikeinterface/core/core_tools.py

zm711 · 2023-11-06T15:54:35Z

src/spikeinterface/core/basesorting.py

@@ -424,6 +427,31 @@ def get_all_spike_trains(self, outputs="unit_id"):
            spikes.append((spike_times, spike_labels))
        return spikes

+    def precompute_spike_trains(self, from_spike_vector: bool = True):


Since this is in core where numba is not automatically installed wouldn't it be safer to have from_spike_vector default to False, so that a user who has only installed core doesn't instantly hit an assert error?

I added the parameter to be complete, but I actually don't know when it would be interesting to precompute spike trains not from the spike vector.
Can you think of a use case?

Not off the top of my head. But I'll think about it :)

for more information, see https://pre-commit.ci

src/spikeinterface/core/basesorting.py

samuelgarcia · 2023-11-06T16:57:50Z

Hi Aurelien.
thanks for this.

I will not have time to get feedback on this very soon.
But globaly:

thanks a lot for this speed up, this was on my todo list
not sure to like the idea of having numba in core... lets see. It can be used optionally if installed, but then we need
it for testing.
not sure to like the semantic in one function. In short many Sorting are already spiketrains centric (most of then) and
some of then are spike train first (numpysorting, ...). We need this function for 2 clear distinct case:
1. when spike train centric to preload in memory
2. when spike vector centric to make the convertion all at once. The caching used to be spiketrain per spiketrain which
  also a valid scenario I think.
  The semantic have to be very clear for this 2 cases. This is now done with one function with one option.
I like the idea to explicitly call this function.

DradeAW · 2023-11-07T09:03:35Z

not sure to like the semantic in one function [...] We need this function for 2 clear distinct case

@samuelgarcia You would prefer to split this in two functions with no parameter each?

DradeAW · 2023-11-07T09:07:23Z

not sure to like the idea of having numba in core... lets see. It can be used optionally if installed, but then we need it for testing.

I removed the need for numba (there is now a default if it's not installed).
For the tests, I've added a skip if numba is not installed, so it will not make the test fail.

h-mayorquin

Some comments added.

Two big questions:
How does this interacts plays with the caching option in to_spike_vector and get_unit_spike_train.

Isn't it duplicating behavior?

How would this play out with multiprocessing, all this pre-computing is lost isn't it?

src/spikeinterface/core/core_tools.py

DradeAW · 2023-11-09T09:44:07Z

How does this interacts plays with the caching option in to_spike_vector and get_unit_spike_train.

It overrides it (forces a recompute).

Isn't it duplicating behavior?

No, because it is a fast computation of all spike trains at once, rather than one unit by one unit (which is, as I showed above, much faster).

How would this play out with multiprocessing, all this pre-computing is lost isn't it?

I'm not sure, but I believe if you get the spike train it should find it in the cache in all cases?

h-mayorquin · 2023-11-09T10:32:38Z

No, because it is a fast computation of all spike trains at once, rather than one unit by one unit (which is, as I showed above, much faster).

So are the functions introduced here doing a better job at calculating the spike_vector than to_spike_vector?

samuelgarcia · 2023-11-17T12:23:24Z

src/spikeinterface/core/basesorting.py

@@ -449,6 +451,33 @@ def get_all_spike_trains(self, outputs="unit_id"):
            spikes.append((spike_times, spike_labels))
        return spikes

+    def precompute_spike_trains(self, from_spike_vector=None):


I do not like too much this method name.
Maybe cache_all_spike_trains() would be more explicit

To me, I would expect a function cache_all_spike_trains to not perform any computation, but just move the result of computation to the cache.
But I agree that the name could maybe be improved

DradeAW · 2023-11-17T12:56:40Z

There are probably a few places in core that could be improved by calling this function, because they need the spike train of all unit ids (maybe when extracting waveforms?)

zm711

cleanup of docstrings

src/spikeinterface/core/sorting_tools.py

…ce into fast_vector_to_dict

samuelgarcia · 2023-11-17T13:30:43Z

There are probably a few places in core that could be improved by calling this function, because they need the spike train of all unit ids (maybe when extracting waveforms?)

Extract waveforms is based on the spike vector.
But maybe elsewhere.

DradeAW · 2023-11-17T14:10:48Z

Latest benchmark: even faster

~200x times faster on a very big sorting:

Old way: 634.3 s
New way: 2.9 s

~35 times faster for a sorting I typically use:

Old way: 24.9 s
New way: 0.7 s

h-mayorquin · 2023-11-18T15:08:01Z

Thanks @samuelgarcia . I introduced some machinery to test this:

#2227

I want to decouple the performance from the Mearec memory model layout limitations.

I will give it a test as soon as I can.

src/spikeinterface/core/sorting_tools.py

h-mayorquin · 2023-11-18T16:18:41Z

I found no difference (as I expected)

import time
import numpy as np
import spikeinterface.core as si


num_units = 1000
durations = [10 * 60 * 60.0]
seed = 25

sorting1 =  SortingGenerator(num_units=num_units, durations=durations, seed=seed)
sorting2 = SortingGenerator(num_units=num_units, durations=durations, seed=seed)

t1 = time.perf_counter()

for unit_id in sorting1.unit_ids:
	sorting1.get_unit_spike_train(unit_id)

t2 = time.perf_counter()

sorting2.precompute_spike_trains()
for unit_id in sorting2.unit_ids:
	sorting2.get_unit_spike_train(unit_id)

t3 = time.perf_counter()


print(f"Good old : {t2-t1:.1f} s")
print(f"Pre-computing: {t3-t2:.1f} s")

for unit_id in sorting1.unit_ids:
	assert np.all(sorting1.get_unit_spike_train(unit_id) == sorting2.get_unit_spike_train(unit_id))

Good old : 1.6 s
Pre-computing: 1.6 s

This might be a phenomenon exclusive to mearec, do you guys suspect that it happens for any other format? Can you c-profile your code to se where is the time spent?

It is very strange to me that we calculate the spike_trains through the spike_vector which is a terrible representation for that: you mix your spike vectors into one long vector (with a sort as a cost) and then you use a numba function to demix because, well, it was not made for that.

I think that a function to transform spike_vector to spike_trains through numba makes a lot of sense in the NumpySorting. That is where it should be most useful and makes thematic sense. Specially if this is a phenomeon exclusive to Mearec I don't think it is worth adding -more- complexity to the core just for this case.

The design principle is to corall complexity away from the core. Implementation details of specific formats should not leak up as a general methods. All the current functionality can be coralled to NumpySorting and currently baserecording has a .to_numpy_sorting method that enables this easily. That is the place to handle functionality that is useful when you have all the data in memory. The core is for lazy operations.

Now, if most formats are like this I think it would make sense but I really want to keep the core simple. I think the bar should be high.

DradeAW · 2023-11-20T09:00:23Z

Yes pre-computing only makes things faster if you come from a spike vector (in other scenarios it's going to be the same speed).

However the NumpySorting is not the only sorting using a spike vector (e.g. PhySortingExtractor).
Plus Sam wanted to make a focus on making sorting more spike vector centered.
Plus NumpySorting is the default save method, so it might be used a lot.

Currently baserecording has a .to_numpy_sorting

I believe there is a typo here ^^

Thanks for the input!
This should not be taken lightly indeed. However I believe that retrieving spike trains is a core functionality of a sorting object, and thus should be optimized (I posted the massive gain for huge sorting objects).

h-mayorquin · 2023-11-20T10:38:11Z

@DradeAW You are correct. My scenario of performance above is wrong. Here is the corrected scenario:

import time
import numpy as np
import spikeinterface.core as si
from spikeinterface.core.generate import SortingGenerator


num_units = 1000
durations = [10 * 60 * 60.0]
seed = 25

sorting1 =  SortingGenerator(num_units=num_units, durations=durations, seed=seed)
sorting2 = SortingGenerator(num_units=num_units, durations=durations, seed=seed)
sorting2.to_spike_vector()

t1 = time.perf_counter()

for unit_id in sorting1.unit_ids:
	sorting1.get_unit_spike_train(unit_id)

t2 = time.perf_counter()
print(f"Good old : {t2-t1:.1f} s")


t3 = time.perf_counter()
sorting2.precompute_spike_trains()
for unit_id in sorting2.unit_ids:
	sorting2.get_unit_spike_train(unit_id)

t4 = time.perf_counter()


print(f"Pre-computing: {t4-t3:.1f} s")

for unit_id in sorting1.unit_ids:
	assert np.all(sorting1.get_unit_spike_train(unit_id) == sorting2.get_unit_spike_train(unit_id))

Good old : 1.6 s
Using numba to compute spike trains
Pre-computing: 1.9 s
After caching:
Good old : 1.6 s
Using numba to compute spike trains
Pre-computing: 1.4 s

So, it is a bit slower on the first run but then after you cache I it becomes faster. I would label this as no-gain but also no loss even in the case with really fast-io (that is, the case of my generator). So I think this supports your case for adding this as even in this extreme case it performs relleatively well (or not bad). So, yes, empiricism works, I am now in your side, this should work OK in most cases, I still don't like the cache option and would like to have a more readable version of the numba implementation but that should be easy. Thanks for bearing with me.

There is an important point though that should not hold this PR. Without numba installed this is terrible slow:

not using numba
Good old : 1.8 s
Pre-computing: 76.9 s

In the light of this, my current position is the following:

If we are going to have this we should have numba in core. Otherwise, we are adding a very slow option that will get triggered without the user knowing about this. Given that the point of adding this is to make performance faster I would not like to run the super-slow function silently. Another option is to throw an assertion letting the user know that this method is not available if they don't have numba. But at this point, I don't really see why not just adding numba at the core. It is not heavy, it is fast to import and it it would also enable other improvements that I have in mind and would avoid all the annoyance about importing numba in a special way.

As I was writting this I think the numba option is better fit for another issue.

DradeAW · 2023-11-20T10:47:34Z

Without numba installed this is terrible slow:

Then there is a bug, because it should not happen since it should be doing the same operation (for loop and getting the unit spike train). This should be fixed before merging!

I still don't like the cache option and would like to have a more readable version of the numba implementation but that should be easy.

I'll add this on my todo list :)

h-mayorquin · 2023-11-20T11:07:08Z

Without numba installed this is terrible slow:

Then there is a bug, because it should not happen since it should be doing the same operation (for loop and getting the unit spike train). This should be fixed before merging!

Don't you believe that numba is just way faster compared to looping? Can you elaborate on why this has to be a bug on the numpy version?

DradeAW · 2023-11-20T11:09:20Z

Numba is faster, but if numba isn't installed, then it should just do it the good old way.

So if numba isn't installed, the behaviour before and after this PR shouldn't change!

h-mayorquin · 2023-11-20T11:27:52Z

src/spikeinterface/core/sorting_tools.py

+        # the trick here is to have a function getter
+        vector_to_list_of_spiketrain = get_numba_vector_to_list_of_spiketrain()
+    else:
+        vector_to_list_of_spiketrain = vector_to_list_of_spiketrain_numpy


Here, if you don't have numba you use the numpy version.

This is eaxctly the "old" way of course.

h-mayorquin · 2023-11-20T12:00:31Z

src/spikeinterface/core/sorting_tools.py

+    return spike_trains
+
+
+def vector_to_list_of_spiketrain_numpy(sample_indices, unit_indices, num_units):


@samuelgarcia @DradeAW

If the "old way" you mean using the numpy version then that is what is super-slow and I think should be avoided.

If by the "old way" you mean just calling get_unit_spike_train for every unit (which is what I meant) then I think is indeed a bug because the implementation is calling this function.

In my version, it was calling get_unit_spike_train(unit_id).

But the function Sam wrote should be doing the same thing the NumpySorting does when you call get_unit_spike_train ... So it should be the same performance

If the sorting is a NumpySorting that would be correct, and it makes sense: NumpySorting is slow at extracting spike trains because, well, the representation is just bad for that. That's what it is.

But if you sorting is not as bad as NumpySorting for calling get_unit_spike_train then the current implementation is terrible without Numba.

That is, If you have a spike_vector cached in something that is not as bad NumpySorting for extracting spikes and you don't have Numba then you will be silently calling a terribly slow function.

Ah I see the problem!

You are speaking about a (corner) case where the get_unit_spike_train function is very fast, yet for some reason the spike trains are not cached but the spike vector is.
Indeed you are correct, in this case it would be terribly slow.

I take issue with you calling that corner_case. Why are you so confident than sorting to build the spike_vector and then masking within resulting structured array is a faster operation than just calling get_unit_spike_train?

My original appeal was to restrict this to methods that already have the spike_vector pre-calculated like NumpySorting and.a as you were saying, maybe Phy. There, you can be confident than calling this unlikely to be bad. But then I tested this method with numba and it works OK even for those cases with kind of fast-io so that seems OK.

But for functions that that don't have spike_vector pre-calculated this seems like a corner case:

You already called get_unit_spike_train to get your spike_vector (so your data should be cached already, not need for this)

Then, you already called get_unit_spike_train but somehow it is better to mask a very long vector (generating a spike in memory) and extract it from there.

So again, I think this works great for cases that already have spike_vector pre-calculated and it works OK even for other cases (if you have numba), but, if you don't have it? or if you don't have numba? Are you sure that get_unit_spike_train is that bad in general?

Ah I read again the function, and realized it does not check whether the spike trains cache is already full (this is in another PR). This should be added.

The corner case I was talking about is when get_unit_spike_train is very fast, yet the spike vector is cached but not the spike trains (since the spike vector is computed form the spike trains, they should be cached).

samuelgarcia · 2023-11-20T15:46:00Z

This PR should be closed and the discussion should happen in #2209!
#2209 included all this concept but rewritten.

DradeAW · 2023-11-20T16:31:07Z

@samuelgarcia I think you are thinking of another PR,

This PR is different!

src/spikeinterface/core/basesorting.py

DradeAW and others added 11 commits November 6, 2023 14:27

Allow precomputing spike trains

0164152

[pre-commit.ci] auto fixes from pre-commit.com hooks

a177516

for more information, see https://pre-commit.ci

Making sure numba is installed

73fe91c

Merge branch 'fast_vector_to_dict' of github.com:DradeAW/spikeinterfa…

4cf076a

…ce into fast_vector_to_dict

[pre-commit.ci] auto fixes from pre-commit.com hooks

1a6086b

for more information, see https://pre-commit.ci

oops

46d7c19

[pre-commit.ci] auto fixes from pre-commit.com hooks

f1b5086

for more information, see https://pre-commit.ci

oops

dcee476

Merge branch 'fast_vector_to_dict' of github.com:DradeAW/spikeinterfa…

e0e9a86

…ce into fast_vector_to_dict

Fix crash

a9fcabf

[pre-commit.ci] auto fixes from pre-commit.com hooks

f52a4c3

for more information, see https://pre-commit.ci

DradeAW added 2 commits November 6, 2023 15:45

Better precompute spike trains

c40c231

Merge branch 'fast_vector_to_dict' of github.com:DradeAW/spikeinterfa…

ac6483a

…ce into fast_vector_to_dict

alejoe91 added the core Changes to core module label Nov 6, 2023

zm711 reviewed Nov 6, 2023

View reviewed changes

DradeAW and others added 2 commits November 6, 2023 17:03

Nicer assert messages

5b845d8

[pre-commit.ci] auto fixes from pre-commit.com hooks

93aa65d

for more information, see https://pre-commit.ci

zm711 reviewed Nov 6, 2023

View reviewed changes

src/spikeinterface/core/basesorting.py Outdated Show resolved Hide resolved

Small tweaks

ca2f553

h-mayorquin reviewed Nov 9, 2023

View reviewed changes

Heberto's suggestions

1814faf

Make NumpySorting more efficient

ab3dbbb

samuelgarcia reviewed Nov 17, 2023

View reviewed changes

remove numba import useless

880d323

DradeAW added 2 commits November 17, 2023 13:59

Fixed import bug

32d75c0

Merge branch 'main' into fast_vector_to_dict

2f3b616

zm711 reviewed Nov 17, 2023

View reviewed changes

DradeAW added 2 commits November 17, 2023 14:05

Fixed bug + docstring

7ff37e6

Merge branch 'fast_vector_to_dict' of github.com:DradeAW/spikeinterfa…

3f71517

…ce into fast_vector_to_dict

h-mayorquin mentioned this pull request Nov 18, 2023

Add spike-train based lazy SortingGenerator #2227

Merged

h-mayorquin reviewed Nov 18, 2023

View reviewed changes

src/spikeinterface/core/sorting_tools.py Outdated Show resolved Hide resolved

h-mayorquin mentioned this pull request Nov 20, 2023

Let's add Numba to the core! #2230

Closed

h-mayorquin reviewed Nov 20, 2023

View reviewed changes

Numba code more readable

8950077

Merge branch 'main' into fast_vector_to_dict

9ec1ed7

alejoe91 reviewed Nov 22, 2023

View reviewed changes

src/spikeinterface/core/basesorting.py Outdated Show resolved Hide resolved

Update src/spikeinterface/core/basesorting.py

41874da

alejoe91 approved these changes Nov 22, 2023

View reviewed changes

alejoe91 merged commit 029c24a into SpikeInterface:main Nov 22, 2023
9 checks passed

		return spike_trains


		def vector_to_list_of_spiketrain_numpy(sample_indices, unit_indices, num_units):

Allow precomputing spike trains #2175

Allow precomputing spike trains #2175

Conversation

DradeAW commented Nov 6, 2023

DradeAW commented Nov 6, 2023

DradeAW commented Nov 6, 2023

DradeAW commented Nov 6, 2023

zm711 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelgarcia commented Nov 6, 2023

DradeAW commented Nov 7, 2023

DradeAW commented Nov 7, 2023

h-mayorquin left a comment

Choose a reason for hiding this comment

DradeAW commented Nov 9, 2023

h-mayorquin commented Nov 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DradeAW commented Nov 17, 2023

zm711 left a comment

Choose a reason for hiding this comment

samuelgarcia commented Nov 17, 2023

DradeAW commented Nov 17, 2023

h-mayorquin commented Nov 18, 2023 • edited Loading

h-mayorquin commented Nov 18, 2023

DradeAW commented Nov 20, 2023

h-mayorquin commented Nov 20, 2023

DradeAW commented Nov 20, 2023

h-mayorquin commented Nov 20, 2023

DradeAW commented Nov 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-mayorquin Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-mayorquin Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelgarcia commented Nov 20, 2023

DradeAW commented Nov 20, 2023

h-mayorquin commented Nov 18, 2023 •

edited

Loading

h-mayorquin Nov 20, 2023 •

edited

Loading

h-mayorquin Nov 20, 2023 •

edited

Loading