-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize numba cross-correlation and extend correlograms.py
docstrings and tests
#3017
Optimize numba cross-correlation and extend correlograms.py
docstrings and tests
#3017
Conversation
4ecddef
to
f906020
Compare
ba9dc38
to
73c7290
Compare
src/spikeinterface/postprocessing/tests/common_extension_tests.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you also profile just so we can see speed differences since you changed both the numpy and numba? I think fixing the edge case is great, but it would be nice to see how speed has changed too.
bins : np.ndarray | ||
The bins edges in ms | ||
window_size : int | ||
The window size in samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we should discuss is that we have frame_slice
and now we are saying samples
. I personally prefer sample terminology, but what do thing about frames/samples vs mixed terminology vs just using frames since that is already built deeply into the library and then we explain the equivalence in the future magic glossary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also prefer sample.
frame is totally an historical stuff done by Alessio or Jeremy I guess.
If day we go to version one we need to stop using "frame" I guess but Alessio will be unhappy with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting,I am not very familiar with this distinction between samples and frames. I did not know this difference until I just looked it up. If I have understood correctly, I think in this instances 'samples' would be correct, for example we have a window size of 10 samples (that together would make up 1 frame). If you had a window size of 10 frames I think this would imply 10 sets of timeseries? It makes sense for frame_index
I guess as it reads 'index of the frame you want to slice'? Although I also prefer sample as I think frame
is less well known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've linked to this #2800 for further discussion
Since Right now we Wouldn't that work? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing amount of work, Joe! I didn't read the algorithm in detail but the tests are very convincing that it's working correctly, and quickly!
Thank @DradeAW, I agree this would be a nicer method and speed things up. I had a play around it now but it leads to a lot or finicky indexing and fencepost problems that I can't quite figure out in good time. It's kind of related to the binning issue below but also to the edge-effects issue mentioned in the dropdown: When trying to flip the bins, the bin-edge behaviour becomes mirrored in the positive vs. negative case rather than consistent across bins. Lets say by convention you are binning with I think this is largely solvable by consistently using floor. But then, there is a related problem with the left-most and right-most bin edge, when the diff is exactly the |
Can't you just do this?
There might be a more optimal method without doing the division twice, but I don't think that the division is that computationally expensive :) |
Co-authored-by: Chris Halcrow <[email protected]>
Co-authored-by: Chris Halcrow <[email protected]>
The consistent use of floor does fix the first issue described, but not the some other observed issues that may or may not be related. I think where I was stuck next was on was the middle-bin for cross-correlation that not consistently filled between forward and backward direction in the case that |
I understand ahah, |
Thanks a lot for the excellent reimplementation. A main (an historical) reason of the slow implementation was the confusion between unit_index and unit_id in implentation. Now the implementaion use unit_index and this is of course crazy faster. I would modify variable names to take this in account to avoid confusion for a futur reader of this code. |
Thanks @samuelgarcia! That makes sense I've renamed |
A bottle of wine is ready for you in Lyon. |
Thanks for working on this @JoeZiminski ! |
This PR (closes #2990):
test_correlogram.py
and reworks some existing tests now that numba / numpy implementations match.I would like to hear @samuelgarcia opinion on the
test_sortinganalyzer_correlograms()
that is supplemental torun_extension_tests()
. It again creates the sortinganalyzer so I think the implementation could be handled better to avoid repeating sortinganalyzer generation. A benefit is that it directly tests the output of sortinganalyzer output, with a lightweight approach simply against thesorting
implementation. Thesorting
object is tested in more detail in the rest of the tests.This PR also makes some potentially contentious API changes that I'm happy to reverse:
compute_autocorrelogram_from_spiketrain
. These were only working for the numba implementation and were untested. My thoughts are it is better to force users to use the central API, to restrict the surface API and help with maintenance. If people want to use correlation functions outside of the central SpikeInterface API they can delve into the private functions, but its easier only to support a small amount of entry functions.compute_correlograms
private (except,correlogram_for_one_segment
which is called elsewhere from within the codebase).TODO: As far as I can tell the docs are giving the wrong default argument values for
compute_correlograms()
, could someone help me track down where these are set?More details can be found in the dropdowns below.
Details on Numba optimisation
Numba Optimisation
The existing numba correlogram version had not yet been optimised with the new spike vector approach. It did not scale well when increasing the number of units.
With the new implementation it is much faster
and with a very large recording for fun
Details on fixing the tests
Below there are some specific details on the steps taken to fix the tests (i.e. address the left-bin problem in the numpy implementation and make the numpy and numba implantations match. It's quite fiddly and only really put here for posterity but is not necessary reading.
Addressing why the previous numba and numpy version did not match
The previous
numba
implementation did not also match exactly the numpy version, in part due to a difference in handling an edge case. When there is a different between two spike times which divides perfectly into thebin_size
, the two methods handled the case differently. I don't think there is a correct way to handle this, technically the spike can go in the bin to the left or the right (e.g. if the diff is 5 ms and thebin_size
is 5 ms, it could be convention to go in 0-5 or 5-10). In the numpy implementation, all spike differences are positive, then thesign
is iterated over to assign to mirrored unit comparisons (e.g. unit 1 vs. 0 and unit 0 vs. 1). Say we have 15 bins, anddiff//bin_size
is12
. The numpy version will assign this to the binsnum_bins-12=3
andnum_bins + 12=27
. In contrast, the numba version calculated everything on one direction (e.g. unit 1 vs 0) and then flips the bins. This means that the difference is negative, the bin would benum_bins-12=3
and stored in the 4th position of the correlogram. When flipped, this becomes the 26th position (instead of 27th).In the new implementation, due to the spike vector, the handing of mirrored bins is just brute-forced through (i.e. both 1 vs 0 and 0 vs 1 are looped over), leading to the result handling this case the same as the numpy version. There are probably more optimisations that could be done on the numba version to avoid this duplication, but as a) this version is fast b) it matches numpy c) it is simple to read and understand, I don't think this is necessary. Below are some plots showing the improvement as compared to the numpy function.
Adressing the numpy first-bin problem
Another reason the numba implementations were not matching the numpy implementation is because the numpy implementation was also missing some counts in the very first bin on a certain edge case. Imagine that you have two units and there is a spike time in one unit that is the same time-difference from both another spike on the unit as well as a spike on a different unit. Also, this time difference happens to be exactly the window edge. e.g.
spike_times=[10, 15, 15]
,spike_labels=[0, 0, 1]
withnum_half_bins=5
(total bins is10
). At the start, the algorithm shifts the spike times 1 index so that the spike diff is15-10 = 5
. The way the bins are assigned means that whensign==-1
, this case should go in the first bin i.e.diff*sign=-5
which equals-num_half_bins
so in thenp.ravel_multi_index
this is assigned to the first bin, index zero.In contrast, for
sign==1
,diff=4
(rather thandiff=5
) goes into the last bin (diff + num_half_bin=9
i.e. the last index of the bins). Whensign==1
all spike times withdiff == num_half_bins
are masked and no longer counted. However, this means that for the next spike time in our example, spike time 10 (unit 0) is never compared with spike time 15 (unit 1). As such, this edge case is missed from the first bin (but not from the last bin). To summarize: the first bin misses repeated counts of delays whendelay == num_half_bin
. Unmasking this case at the end of thesign==1
loop prior to the nextsign==-1
iteration recovers these lost bins.