Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize numba cross-correlation and extend correlograms.py docstrings and tests #3017

Merged

Conversation

JoeZiminski
Copy link
Collaborator

@JoeZiminski JoeZiminski commented Jun 11, 2024

This PR (closes #2990):

  1. optimizes the numba cross-correlation implementation, thanks to @DradeAW for advice on how to best optimise it.
  2. extends docstrings in correlogram.py and test_correlogram.py
  3. Makes a fix to the numpy implementation of correlogram computation to handle a left-bin edge case that was giving wrong results.
  4. Adds some additional tests to test_correlogram.py and reworks some existing tests now that numba / numpy implementations match.

I would like to hear @samuelgarcia opinion on the test_sortinganalyzer_correlograms() that is supplemental to run_extension_tests(). It again creates the sortinganalyzer so I think the implementation could be handled better to avoid repeating sortinganalyzer generation. A benefit is that it directly tests the output of sortinganalyzer output, with a lightweight approach simply against the sorting implementation. The sorting object is tested in more detail in the rest of the tests.

This PR also makes some potentially contentious API changes that I'm happy to reverse:

  • Remove compute_autocorrelogram_from_spiketrain. These were only working for the numba implementation and were untested. My thoughts are it is better to force users to use the central API, to restrict the surface API and help with maintenance. If people want to use correlation functions outside of the central SpikeInterface API they can delve into the private functions, but its easier only to support a small amount of entry functions.
  • Makes every function except for compute_correlograms private (except, correlogram_for_one_segment which is called elsewhere from within the codebase).

TODO: As far as I can tell the docs are giving the wrong default argument values for compute_correlograms(), could someone help me track down where these are set?

More details can be found in the dropdowns below.

Details on Numba optimisation

Numba Optimisation

The existing numba correlogram version had not yet been optimised with the new spike vector approach. It did not scale well when increasing the number of units.

numpy-v-numba-changing-units
numy-v-numba-increase-duration

With the new implementation it is much faster

numpy_vs_numba_new-durations
numpy_vs_numba_new_units
and with a very large recording for fun
numpy_vs_numbanew_long_dur_units

Details on fixing the tests

Below there are some specific details on the steps taken to fix the tests (i.e. address the left-bin problem in the numpy implementation and make the numpy and numba implantations match. It's quite fiddly and only really put here for posterity but is not necessary reading.

Addressing why the previous numba and numpy version did not match

The previous numba implementation did not also match exactly the numpy version, in part due to a difference in handling an edge case. When there is a different between two spike times which divides perfectly into the bin_size, the two methods handled the case differently. I don't think there is a correct way to handle this, technically the spike can go in the bin to the left or the right (e.g. if the diff is 5 ms and the bin_sizeis 5 ms, it could be convention to go in 0-5 or 5-10). In the numpy implementation, all spike differences are positive, then the sign is iterated over to assign to mirrored unit comparisons (e.g. unit 1 vs. 0 and unit 0 vs. 1). Say we have 15 bins, and diff//bin_size is 12. The numpy version will assign this to the bins num_bins-12=3 and num_bins + 12=27. In contrast, the numba version calculated everything on one direction (e.g. unit 1 vs 0) and then flips the bins. This means that the difference is negative, the bin would benum_bins-12=3 and stored in the 4th position of the correlogram. When flipped, this becomes the 26th position (instead of 27th).

In the new implementation, due to the spike vector, the handing of mirrored bins is just brute-forced through (i.e. both 1 vs 0 and 0 vs 1 are looped over), leading to the result handling this case the same as the numpy version. There are probably more optimisations that could be done on the numba version to avoid this duplication, but as a) this version is fast b) it matches numpy c) it is simple to read and understand, I don't think this is necessary. Below are some plots showing the improvement as compared to the numpy function.

Adressing the numpy first-bin problem

Another reason the numba implementations were not matching the numpy implementation is because the numpy implementation was also missing some counts in the very first bin on a certain edge case. Imagine that you have two units and there is a spike time in one unit that is the same time-difference from both another spike on the unit as well as a spike on a different unit. Also, this time difference happens to be exactly the window edge. e.g. spike_times=[10, 15, 15], spike_labels=[0, 0, 1] with num_half_bins=5 (total bins is 10). At the start, the algorithm shifts the spike times 1 index so that the spike diff is 15-10 = 5. The way the bins are assigned means that when sign==-1, this case should go in the first bin i.e. diff*sign=-5 which equals -num_half_bins so in the np.ravel_multi_index this is assigned to the first bin, index zero.
In contrast, for sign==1, diff=4 (rather than diff=5) goes into the last bin (diff + num_half_bin=9 i.e. the last index of the bins). When sign==1 all spike times with diff == num_half_bins are masked and no longer counted. However, this means that for the next spike time in our example, spike time 10 (unit 0) is never compared with spike time 15 (unit 1). As such, this edge case is missed from the first bin (but not from the last bin). To summarize: the first bin misses repeated counts of delays when delay == num_half_bin. Unmasking this case at the end of the sign==1 loop prior to the next sign==-1 iteration recovers these lost bins.

@alejoe91 alejoe91 added postprocessing Related to postprocessing module labels Jun 12, 2024
@samuelgarcia samuelgarcia added this to the 0.101.0 milestone Jun 12, 2024
@JoeZiminski JoeZiminski force-pushed the optimize_numba_correlogram branch from 4ecddef to f906020 Compare June 19, 2024 19:13
@JoeZiminski JoeZiminski force-pushed the optimize_numba_correlogram branch from ba9dc38 to 73c7290 Compare June 19, 2024 21:08
@JoeZiminski JoeZiminski marked this pull request as ready for review June 21, 2024 09:06
@JoeZiminski JoeZiminski requested a review from DradeAW June 21, 2024 09:06
Copy link
Collaborator

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also profile just so we can see speed differences since you changed both the numpy and numba? I think fixing the edge case is great, but it would be nice to see how speed has changed too.

bins : np.ndarray
The bins edges in ms
window_size : int
The window size in samples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we should discuss is that we have frame_slice and now we are saying samples. I personally prefer sample terminology, but what do thing about frames/samples vs mixed terminology vs just using frames since that is already built deeply into the library and then we explain the equivalence in the future magic glossary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer sample.
frame is totally an historical stuff done by Alessio or Jeremy I guess.
If day we go to version one we need to stop using "frame" I guess but Alessio will be unhappy with this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting,I am not very familiar with this distinction between samples and frames. I did not know this difference until I just looked it up. If I have understood correctly, I think in this instances 'samples' would be correct, for example we have a window size of 10 samples (that together would make up 1 frame). If you had a window size of 10 frames I think this would imply 10 sets of timeseries? It makes sense for frame_index I guess as it reads 'index of the frame you want to slice'? Although I also prefer sample as I think frame is less well known.

Copy link
Collaborator Author

@JoeZiminski JoeZiminski Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've linked to this #2800 for further discussion

src/spikeinterface/postprocessing/correlograms.py Outdated Show resolved Hide resolved
@DradeAW
Copy link
Contributor

DradeAW commented Jun 27, 2024

Since correlogram[A, B, :] = correlogram[B, A, ::-1] and since the numba method now computes from all pairs at once, I think there is a way to make it even faster (around twice faster).

Right now we start_j, and we are going through all pairs twice.
We could make the second loop start at i+1, and update the correlograms of [A, B] and [B, A] at the same time.

Wouldn't that work?

Copy link
Collaborator

@chrishalcrow chrishalcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing amount of work, Joe! I didn't read the algorithm in detail but the tests are very convincing that it's working correctly, and quickly!

@JoeZiminski
Copy link
Collaborator Author

Thank @DradeAW, I agree this would be a nicer method and speed things up. I had a play around it now but it leads to a lot or finicky indexing and fencepost problems that I can't quite figure out in good time. It's kind of related to the binning issue below but also to the edge-effects issue mentioned in the dropdown:

When trying to flip the bins, the bin-edge behaviour becomes mirrored in the positive vs. negative case rather than consistent across bins. Lets say by convention you are binning with np.floor so all spikes are binned to the left-edge. Say you have a bins of 5 ms, and a spike time 17.5 ms difference that needs to go in the 15-20 ms bin. In the negative casediff/bin_size=-3.5 Then you take the floor and it goes to bin -4 (indexed relative to the middle, 0th bin). So you say your left bind edge is -20 so you are in the -20 to - 15 bin. If you are in positive-land the floor should be 3 and your left bin edge is 15 (bin is 15-20). If however you flip the bin or you round in the same way you get positive bin 4 which is left-bin edge 20, bin (20-25).

I think this is largely solvable by consistently using floor. But then, there is a related problem with the left-most and right-most bin edge, when the diff is exactly the bin_size, and I think these problems interact somewhat, making it hard to solve neatly. I benchmarked a verion starting the second loop at i and indexing twice-per-loop into correlograms and the speedup is about 20% (0.0837 vs 0.104 and 3.427 vs 2.722 seconds in the slow and fast cases above), so I guess the loop itself might be not dominating the runtime, and we are not missing out on too much speedup with this double-looping. What do you think?

@DradeAW
Copy link
Contributor

DradeAW commented Jun 27, 2024

Can't you just do this?

bin1 = diff // bin_size
bin2 = (-diff) // bin_size
correlograms[spike_labels[i], spike_labels[j], num_half_bins + bin1] += 1
correlograms[spike_labels[j], spike_labels[i], num_half_bins + bin2] += 1

There might be a more optimal method without doing the division twice, but I don't think that the division is that computationally expensive :)

JoeZiminski and others added 2 commits June 27, 2024 18:43
@JoeZiminski
Copy link
Collaborator Author

JoeZiminski commented Jun 27, 2024

The consistent use of floor does fix the first issue described, but not the some other observed issues that may or may not be related. I think where I was stuck next was on was the middle-bin for cross-correlation that not consistently filled between forward and backward direction in the case that diff == bin_size. Attempting to fix this led to some left-and-right bin issues (i.e. left or right bin is empty or has too many counts). Please feel free to take a look, it would be cool to resolve this and implement an even faster option. But I don't think I can take any more of trying to fix these weird bin edge cases for now 😄

@DradeAW
Copy link
Contributor

DradeAW commented Jun 27, 2024

But I don't think I can take any more of trying to fix these weird bin edge cases for now

I understand ahah,
If I find the time I might try things,
But conceptually, my code above should do EXACTLY the same as what the function did previously

@samuelgarcia
Copy link
Member

Thanks a lot for the excellent reimplementation.
This was a real need since the spike_vector story.

A main (an historical) reason of the slow implementation was the confusion between unit_index and unit_id in implentation. Now the implementaion use unit_index and this is of course crazy faster.

I would modify variable names to take this in account to avoid confusion for a futur reader of this code.
What do you think ?

@JoeZiminski
Copy link
Collaborator Author

Thanks @samuelgarcia! That makes sense I've renamed spike_labels -> spike_unit_indices, the new spike_vector approach is fantastic!

@samuelgarcia
Copy link
Member

A bottle of wine is ready for you in Lyon.

@samuelgarcia samuelgarcia merged commit 203f778 into SpikeInterface:main Jul 3, 2024
17 checks passed
@h-mayorquin
Copy link
Collaborator

Thanks for working on this @JoeZiminski !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
postprocessing Related to postprocessing module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Numpy and Numba do not give exactly the same output in unit test environment
7 participants