Optimize numba cross-correlation and extend `correlograms.py` docstrings and tests #3017

JoeZiminski · 2024-06-11T20:23:41Z

This PR (closes #2990):

optimizes the numba cross-correlation implementation, thanks to @DradeAW for advice on how to best optimise it.
extends docstrings in correlogram.py and test_correlogram.py
Makes a fix to the numpy implementation of correlogram computation to handle a left-bin edge case that was giving wrong results.
Adds some additional tests to test_correlogram.py and reworks some existing tests now that numba / numpy implementations match.

I would like to hear @samuelgarcia opinion on the test_sortinganalyzer_correlograms() that is supplemental to run_extension_tests(). It again creates the sortinganalyzer so I think the implementation could be handled better to avoid repeating sortinganalyzer generation. A benefit is that it directly tests the output of sortinganalyzer output, with a lightweight approach simply against the sorting implementation. The sorting object is tested in more detail in the rest of the tests.

This PR also makes some potentially contentious API changes that I'm happy to reverse:

Remove compute_autocorrelogram_from_spiketrain. These were only working for the numba implementation and were untested. My thoughts are it is better to force users to use the central API, to restrict the surface API and help with maintenance. If people want to use correlation functions outside of the central SpikeInterface API they can delve into the private functions, but its easier only to support a small amount of entry functions.
Makes every function except for compute_correlograms private (except, correlogram_for_one_segment which is called elsewhere from within the codebase).

TODO: As far as I can tell the docs are giving the wrong default argument values for compute_correlograms(), could someone help me track down where these are set?

More details can be found in the dropdowns below.

Details on Numba optimisation

Numba Optimisation

The existing numba correlogram version had not yet been optimised with the new spike vector approach. It did not scale well when increasing the number of units.

With the new implementation it is much faster

and with a very large recording for fun

Details on fixing the tests

Below there are some specific details on the steps taken to fix the tests (i.e. address the left-bin problem in the numpy implementation and make the numpy and numba implantations match. It's quite fiddly and only really put here for posterity but is not necessary reading.

Addressing why the previous numba and numpy version did not match

The previous numba implementation did not also match exactly the numpy version, in part due to a difference in handling an edge case. When there is a different between two spike times which divides perfectly into the bin_size, the two methods handled the case differently. I don't think there is a correct way to handle this, technically the spike can go in the bin to the left or the right (e.g. if the diff is 5 ms and the bin_sizeis 5 ms, it could be convention to go in 0-5 or 5-10). In the numpy implementation, all spike differences are positive, then the sign is iterated over to assign to mirrored unit comparisons (e.g. unit 1 vs. 0 and unit 0 vs. 1). Say we have 15 bins, and diff//bin_size is 12. The numpy version will assign this to the bins num_bins-12=3 and num_bins + 12=27. In contrast, the numba version calculated everything on one direction (e.g. unit 1 vs 0) and then flips the bins. This means that the difference is negative, the bin would benum_bins-12=3 and stored in the 4th position of the correlogram. When flipped, this becomes the 26th position (instead of 27th).

In the new implementation, due to the spike vector, the handing of mirrored bins is just brute-forced through (i.e. both 1 vs 0 and 0 vs 1 are looped over), leading to the result handling this case the same as the numpy version. There are probably more optimisations that could be done on the numba version to avoid this duplication, but as a) this version is fast b) it matches numpy c) it is simple to read and understand, I don't think this is necessary. Below are some plots showing the improvement as compared to the numpy function.

Adressing the numpy first-bin problem

Another reason the numba implementations were not matching the numpy implementation is because the numpy implementation was also missing some counts in the very first bin on a certain edge case. Imagine that you have two units and there is a spike time in one unit that is the same time-difference from both another spike on the unit as well as a spike on a different unit. Also, this time difference happens to be exactly the window edge. e.g. spike_times=[10, 15, 15], spike_labels=[0, 0, 1] with num_half_bins=5 (total bins is 10). At the start, the algorithm shifts the spike times 1 index so that the spike diff is 15-10 = 5. The way the bins are assigned means that when sign==-1, this case should go in the first bin i.e. diff*sign=-5 which equals -num_half_bins so in the np.ravel_multi_index this is assigned to the first bin, index zero.
In contrast, for sign==1, diff=4 (rather than diff=5) goes into the last bin (diff + num_half_bin=9 i.e. the last index of the bins). When sign==1 all spike times with diff == num_half_bins are masked and no longer counted. However, this means that for the next spike time in our example, spike time 10 (unit 0) is never compared with spike time 15 (unit 1). As such, this edge case is missed from the first bin (but not from the last bin). To summarize: the first bin misses repeated counts of delays when delay == num_half_bin. Unmasking this case at the end of the sign==1 loop prior to the next sign==-1 iteration recovers these lost bins.

src/spikeinterface/postprocessing/correlograms.py

src/spikeinterface/postprocessing/tests/common_extension_tests.py

src/spikeinterface/postprocessing/correlograms.py

zm711

Did you also profile just so we can see speed differences since you changed both the numpy and numba? I think fixing the edge case is great, but it would be nice to see how speed has changed too.

zm711 · 2024-06-27T12:13:45Z

src/spikeinterface/postprocessing/correlograms.py

+    bins : np.ndarray
+        The bins edges in ms
+    window_size : int
+        The window size in samples


One thing we should discuss is that we have frame_slice and now we are saying samples. I personally prefer sample terminology, but what do thing about frames/samples vs mixed terminology vs just using frames since that is already built deeply into the library and then we explain the equivalence in the future magic glossary?

I also prefer sample.
frame is totally an historical stuff done by Alessio or Jeremy I guess.
If day we go to version one we need to stop using "frame" I guess but Alessio will be unhappy with this.

This is interesting,I am not very familiar with this distinction between samples and frames. I did not know this difference until I just looked it up. If I have understood correctly, I think in this instances 'samples' would be correct, for example we have a window size of 10 samples (that together would make up 1 frame). If you had a window size of 10 frames I think this would imply 10 sets of timeseries? It makes sense for frame_index I guess as it reads 'index of the frame you want to slice'? Although I also prefer sample as I think frame is less well known.

I've linked to this #2800 for further discussion

src/spikeinterface/postprocessing/correlograms.py

DradeAW · 2024-06-27T12:55:04Z

Since correlogram[A, B, :] = correlogram[B, A, ::-1] and since the numba method now computes from all pairs at once, I think there is a way to make it even faster (around twice faster).

Right now we start_j, and we are going through all pairs twice.
We could make the second loop start at i+1, and update the correlograms of [A, B] and [B, A] at the same time.

Wouldn't that work?

chrishalcrow

Amazing amount of work, Joe! I didn't read the algorithm in detail but the tests are very convincing that it's working correctly, and quickly!

JoeZiminski · 2024-06-27T15:58:20Z

Thank @DradeAW, I agree this would be a nicer method and speed things up. I had a play around it now but it leads to a lot or finicky indexing and fencepost problems that I can't quite figure out in good time. It's kind of related to the binning issue below but also to the edge-effects issue mentioned in the dropdown:

When trying to flip the bins, the bin-edge behaviour becomes mirrored in the positive vs. negative case rather than consistent across bins. Lets say by convention you are binning with np.floor so all spikes are binned to the left-edge. Say you have a bins of 5 ms, and a spike time 17.5 ms difference that needs to go in the 15-20 ms bin. In the negative casediff/bin_size=-3.5 Then you take the floor and it goes to bin -4 (indexed relative to the middle, 0th bin). So you say your left bind edge is -20 so you are in the -20 to - 15 bin. If you are in positive-land the floor should be 3 and your left bin edge is 15 (bin is 15-20). If however you flip the bin or you round in the same way you get positive bin 4 which is left-bin edge 20, bin (20-25).

I think this is largely solvable by consistently using floor. But then, there is a related problem with the left-most and right-most bin edge, when the diff is exactly the bin_size, and I think these problems interact somewhat, making it hard to solve neatly. I benchmarked a verion starting the second loop at i and indexing twice-per-loop into correlograms and the speedup is about 20% (0.0837 vs 0.104 and 3.427 vs 2.722 seconds in the slow and fast cases above), so I guess the loop itself might be not dominating the runtime, and we are not missing out on too much speedup with this double-looping. What do you think?

DradeAW · 2024-06-27T16:36:02Z

Can't you just do this?

bin1 = diff // bin_size
bin2 = (-diff) // bin_size
correlograms[spike_labels[i], spike_labels[j], num_half_bins + bin1] += 1
correlograms[spike_labels[j], spike_labels[i], num_half_bins + bin2] += 1

There might be a more optimal method without doing the division twice, but I don't think that the division is that computationally expensive :)

Co-authored-by: Chris Halcrow <[email protected]>

JoeZiminski · 2024-06-27T17:49:00Z

The consistent use of floor does fix the first issue described, but not the some other observed issues that may or may not be related. I think where I was stuck next was on was the middle-bin for cross-correlation that not consistently filled between forward and backward direction in the case that diff == bin_size. Attempting to fix this led to some left-and-right bin issues (i.e. left or right bin is empty or has too many counts). Please feel free to take a look, it would be cool to resolve this and implement an even faster option. But I don't think I can take any more of trying to fix these weird bin edge cases for now 😄

DradeAW · 2024-06-27T17:59:09Z

But I don't think I can take any more of trying to fix these weird bin edge cases for now

I understand ahah,
If I find the time I might try things,
But conceptually, my code above should do EXACTLY the same as what the function did previously

src/spikeinterface/postprocessing/correlograms.py

samuelgarcia · 2024-06-28T10:47:08Z

Thanks a lot for the excellent reimplementation.
This was a real need since the spike_vector story.

A main (an historical) reason of the slow implementation was the confusion between unit_index and unit_id in implentation. Now the implementaion use unit_index and this is of course crazy faster.

I would modify variable names to take this in account to avoid confusion for a futur reader of this code.
What do you think ?

JoeZiminski · 2024-07-02T10:39:11Z

Thanks @samuelgarcia! That makes sense I've renamed spike_labels -> spike_unit_indices, the new spike_vector approach is fantastic!

samuelgarcia · 2024-07-03T09:06:26Z

A bottle of wine is ready for you in Lyon.

h-mayorquin · 2024-07-03T13:38:22Z

Thanks for working on this @JoeZiminski !

DradeAW reviewed Jun 11, 2024

View reviewed changes

src/spikeinterface/postprocessing/correlograms.py Outdated Show resolved Hide resolved

JoeZiminski mentioned this pull request Jun 11, 2024

Extend correlogram docstrings #3011

Closed

alejoe91 added postprocessing Related to postprocessing module labels Jun 12, 2024

samuelgarcia added this to the 0.101.0 milestone Jun 12, 2024

JoeZiminski added 3 commits June 19, 2024 20:13

Add docstrings from other PR.

b8f8627

Make astype copy=False.

b0dab65

Add a few more notes and thoughts.

f906020

JoeZiminski force-pushed the optimize_numba_correlogram branch from 4ecddef to f906020 Compare June 19, 2024 19:13

JoeZiminski added 2 commits June 19, 2024 21:23

Continue playing with tests.

3de3d9a

Fix edge issue on numpy and numba.

73c7290

JoeZiminski force-pushed the optimize_numba_correlogram branch from ba9dc38 to 73c7290 Compare June 19, 2024 21:08

JoeZiminski and others added 10 commits June 19, 2024 22:15

Playing a little bit more comparing against old numba version.

dd371e0

Continue hunting for solution to numpy left-bound.

5391363

99% sure numpy version is fixed next doc reasoning.

45be351

Add fix for left bin edge problem in numpy implementation.

763f266

Fix up and add docstrings to final tests.

0d252d8

Remove old numba implementation.

f36d95b

Tentatively remove the compute from spiketrain functions.

67aa872

Tentatively make some functions private.

4d4c4cd

Remove some notes.

2c3eeb9

Add back 'correlogram_for_one_segment' to __init__.py

09d67ab

JoeZiminski commented Jun 20, 2024

View reviewed changes

src/spikeinterface/postprocessing/tests/common_extension_tests.py Outdated Show resolved Hide resolved

JoeZiminski added 2 commits June 20, 2024 21:19

Remove TODO

c86e4f0

Final tidy ups.

6c2bb48

JoeZiminski marked this pull request as ready for review June 21, 2024 09:06

JoeZiminski requested a review from DradeAW June 21, 2024 09:06

JoeZiminski mentioned this pull request Jun 21, 2024

Remove get_template_extremum_channel from spikelocations init #3015

Merged

zm711 reviewed Jun 24, 2024

View reviewed changes

src/spikeinterface/postprocessing/correlograms.py Outdated Show resolved Hide resolved

Move long docstrings to 'Notes' section.

5ebb984

zm711 reviewed Jun 27, 2024

View reviewed changes

chrishalcrow reviewed Jun 27, 2024

View reviewed changes

src/spikeinterface/postprocessing/correlograms.py Outdated Show resolved Hide resolved

chrishalcrow reviewed Jun 27, 2024

View reviewed changes

src/spikeinterface/postprocessing/correlograms.py Outdated Show resolved Hide resolved

chrishalcrow reviewed Jun 27, 2024

View reviewed changes

src/spikeinterface/postprocessing/correlograms.py Outdated Show resolved Hide resolved

chrishalcrow reviewed Jun 27, 2024

View reviewed changes

JoeZiminski and others added 2 commits June 27, 2024 18:43

Fix bad grammer in docstring

2a32b13

Co-authored-by: Chris Halcrow <[email protected]>

Fix weird space in the word 'window'

503cf1a

Co-authored-by: Chris Halcrow <[email protected]>

samuelgarcia reviewed Jun 28, 2024

View reviewed changes

src/spikeinterface/postprocessing/correlograms.py Outdated Show resolved Hide resolved

samuelgarcia and others added 3 commits June 28, 2024 12:47

Merge branch 'main' into optimize_numba_correlogram

5b5dbc8

Rename clusters to units.

b1392ed

Add formula to correlogram docstring description.

2741273

JoeZiminski mentioned this pull request Jul 2, 2024

Changes to 'Modules Documentation' section of documentation #2800

Open

Rename spike_labels -> spike_unit_indices

a96821c

JoeZiminski requested review from zm711, chrishalcrow, h-mayorquin and samuelgarcia July 2, 2024 10:39

alejoe91 assigned samuelgarcia Jul 3, 2024

samuelgarcia approved these changes Jul 3, 2024

View reviewed changes

samuelgarcia merged commit 203f778 into SpikeInterface:main Jul 3, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize numba cross-correlation and extend `correlograms.py` docstrings and tests #3017

Optimize numba cross-correlation and extend `correlograms.py` docstrings and tests #3017

JoeZiminski commented Jun 11, 2024 •

edited

Loading

zm711 left a comment

zm711 Jun 27, 2024

samuelgarcia Jun 28, 2024

JoeZiminski Jul 2, 2024

JoeZiminski Jul 2, 2024 •

edited

Loading

DradeAW commented Jun 27, 2024

chrishalcrow left a comment

JoeZiminski commented Jun 27, 2024

DradeAW commented Jun 27, 2024

JoeZiminski commented Jun 27, 2024 •

edited

Loading

DradeAW commented Jun 27, 2024

samuelgarcia commented Jun 28, 2024

JoeZiminski commented Jul 2, 2024

samuelgarcia commented Jul 3, 2024

h-mayorquin commented Jul 3, 2024

Optimize numba cross-correlation and extend correlograms.py docstrings and tests #3017

Optimize numba cross-correlation and extend correlograms.py docstrings and tests #3017

Conversation

JoeZiminski commented Jun 11, 2024 • edited Loading

Numba Optimisation

Addressing why the previous numba and numpy version did not match

Adressing the numpy first-bin problem

zm711 left a comment

Choose a reason for hiding this comment

zm711 Jun 27, 2024

Choose a reason for hiding this comment

samuelgarcia Jun 28, 2024

Choose a reason for hiding this comment

JoeZiminski Jul 2, 2024

Choose a reason for hiding this comment

JoeZiminski Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

DradeAW commented Jun 27, 2024

chrishalcrow left a comment

Choose a reason for hiding this comment

JoeZiminski commented Jun 27, 2024

DradeAW commented Jun 27, 2024

JoeZiminski commented Jun 27, 2024 • edited Loading

DradeAW commented Jun 27, 2024

samuelgarcia commented Jun 28, 2024

JoeZiminski commented Jul 2, 2024

samuelgarcia commented Jul 3, 2024

h-mayorquin commented Jul 3, 2024

Optimize numba cross-correlation and extend `correlograms.py` docstrings and tests #3017

Optimize numba cross-correlation and extend `correlograms.py` docstrings and tests #3017

JoeZiminski commented Jun 11, 2024 •

edited

Loading

JoeZiminski Jul 2, 2024 •

edited

Loading

JoeZiminski commented Jun 27, 2024 •

edited

Loading