Improvement when counting num spikes #2198

DradeAW · 2023-11-13T14:43:21Z

Spike vector should be the default, as computing 1 spike train will mean the cache is available (but doesn't mean it's faster to use the cached spike trains)

alejoe91 · 2023-11-14T10:19:27Z

Have you profiled it? In the spike trains approach, you just count arrays in memory. With the spike vector approach you need a np.unique call, which sounds slower. Of course in case neither spike trains nor spike vectors are cached, probably the latter is better. But if both ar cached I suspect that spike trains is faster!

DradeAW · 2023-11-14T10:31:22Z

The problem is that if you called get_unit_spike_train on one unit_id, then in the old version it's going to compute the spike train of all units, which is very slow.

np.unique is very fast, even for very big vectors. I don't think a human can tell the difference!

alejoe91 · 2023-11-14T10:41:46Z

Which "old" version? If they are all cached it should still be faster, but I agree that they might not be all cached and that unique is fast. So good for me ;)

DradeAW · 2023-11-14T11:00:48Z

By "old" I meant current (not this PR)
Maybe instead of reversing we can check whether they are all cached or not.
But checking whether the cache exists is not sufficient.

samuelgarcia · 2023-11-15T12:58:50Z

@DradeAW : I will build on top of this PR a option to return a dict or a numpy.array for counting spikes.
In many situtation the array will be easier (and maybe faster)

samuelgarcia · 2023-11-15T13:12:38Z

See this PR which is built on top of this one : #2209

DradeAW · 2023-11-15T13:14:43Z

Yep makes sense!

Also another useful option would be to return per segment or the sum over all segments?

h-mayorquin · 2023-11-15T13:39:39Z

I agree with @alejoe91 that you should profile this.

np.unique is very fast, even for very big vectors. I don't think a human can tell the difference!

Really optimization is something that should not be based on guesses or expected behavior.

DradeAW · 2023-11-15T14:16:22Z

Here is a benchmark for a big soring (NumpyFolderSorting with 531 units, for a total of 11,298,565 spikes)

import time
import numpy as np
import spikeinterface.core as si


sorting = si.load_extractor("/mnt/raid0/data/MEArec/1h_3000cells/analyses/ks2_5_pj7-3/sorting")
N = 100
time_spike_trains = np.empty(N, dtype=np.float32)
time_spike_vector = np.empty(N, dtype=np.float32)

# Cache spike vector and spike trains
sorting.to_spike_vector(use_cache=True)
for unit_id in sorting.unit_ids:
	sorting.get_unit_spike_train(unit_id, use_cache=True)

# Benchmark
for i in range(N):
	t1 = time.perf_counter()
	sorting.count_num_spikes_per_unit(method="spike_trains")  # I added the argument 'method' to benchmark
	t2 = time.perf_counter()
	sorting.count_num_spikes_per_unit(method="spike_vector")
	t3 = time.perf_counter()

	time_spike_trains[i] = t2 - t1
	time_spike_vector[i] = t3 - t2

print(f"Spike trains: {1e3*np.mean(time_spike_trains):.1f} +- {1e3*np.std(time_spike_trains):.1f} ms")
print(f"Spike vector: {1e3*np.mean(time_spike_vector):.1f} +- {1e3*np.std(time_spike_vector):.1f} ms")

Output:

Spike trains: 0.5 +- 0.1 ms
Spike vector: 263.8 +- 6.4 ms

So a clear difference, but also clearly under a second for each.
Although keep in mind that you should add the time of checking whether the spike train cache is complete for a good comparison.

h-mayorquin · 2023-11-15T17:49:40Z

The diff on git made this very confusing to me.

You changed the method to use the spike_vector instead of the dictionary to count spikes when you have the cache. Then you are showing us that when you have both the cache and the spike_vector. What about if you don't have the spike vector to begin with?

Also, why did you change the else branch?

DradeAW · 2023-11-16T13:17:51Z

I didn't change the else, I just switched which one came first.

The idea is that, right now, I don't know how to check whether all the spike trains have been computed, and the computation is only faster if you have all of them (as shown in my other PR where computing spike trains is very slow)

If you don't have the spike vector to begin with then it default to looking at the spike trains, which is the correct behaviour

samuelgarcia · 2023-11-17T10:56:13Z

The #2209 is now doing a better choice on top of this one.

DradeAW · 2023-11-17T10:57:27Z

Agreed, I'll close this PR :)

DradeAW added 2 commits November 13, 2023 15:42

Improvement when counting num spikes

65c5024

Spike vector should be the default, as computing 1 spike train will mean the cache is available (but doesn't mean it's faster to use the cached spike trains)

Merge branch 'main' into num_spikes_vector_fast

3601564

alejoe91 added core Changes to core module performance Performance issues/improvements labels Nov 14, 2023

samuelgarcia mentioned this pull request Nov 15, 2023

Add an option for count_num_spikes_per_unit #2209

Merged

DradeAW closed this Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement when counting num spikes #2198

Improvement when counting num spikes #2198

DradeAW commented Nov 13, 2023

alejoe91 commented Nov 14, 2023

DradeAW commented Nov 14, 2023

alejoe91 commented Nov 14, 2023

DradeAW commented Nov 14, 2023

samuelgarcia commented Nov 15, 2023

samuelgarcia commented Nov 15, 2023

DradeAW commented Nov 15, 2023

h-mayorquin commented Nov 15, 2023

DradeAW commented Nov 15, 2023 •

edited

Loading

h-mayorquin commented Nov 15, 2023

DradeAW commented Nov 16, 2023

samuelgarcia commented Nov 17, 2023

DradeAW commented Nov 17, 2023

Improvement when counting num spikes #2198

Improvement when counting num spikes #2198

Conversation

DradeAW commented Nov 13, 2023

alejoe91 commented Nov 14, 2023

DradeAW commented Nov 14, 2023

alejoe91 commented Nov 14, 2023

DradeAW commented Nov 14, 2023

samuelgarcia commented Nov 15, 2023

samuelgarcia commented Nov 15, 2023

DradeAW commented Nov 15, 2023

h-mayorquin commented Nov 15, 2023

DradeAW commented Nov 15, 2023 • edited Loading

h-mayorquin commented Nov 15, 2023

DradeAW commented Nov 16, 2023

samuelgarcia commented Nov 17, 2023

DradeAW commented Nov 17, 2023

DradeAW commented Nov 15, 2023 •

edited

Loading