Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods to sparsify and densify waveforms to ChannelSparsity #1985

Merged

Conversation

h-mayorquin
Copy link
Collaborator

I have been wanting to add something like this for a while and it will be useful in the context of #1982.

The idea is that if we have channel sparsity, it should be possible to use the class to compute a sparse or dense representation of a waveform or a template. Now the ChannelSparsity object is part of WaveformExtractor but it does not have a way of changing the representation at a waveform at will in the spot.

Instead, we do sparsifying operations all over the place directly with the mask:

https://github.com/h-mayorquin/spikeinterface/blob/aa5248454462084dcd7a05296ef82c0bb717a285/src/spikeinterface/core/waveform_extractor.py#L946-L950

https://github.com/h-mayorquin/spikeinterface/blob/aa5248454462084dcd7a05296ef82c0bb717a285/src/spikeinterface/core/waveform_extractor.py#L893-L909

I think it is better to centralize this operation where it semantically make sense (a sparsity class should sparsify!) and test it appropriately.

@h-mayorquin h-mayorquin added the core Changes to core module label Sep 13, 2023
@h-mayorquin h-mayorquin self-assigned this Sep 13, 2023
@h-mayorquin h-mayorquin marked this pull request as ready for review September 13, 2023 11:21
def __repr__(self):
ratio = np.mean(self.mask)
txt = f"ChannelSparsity - units: {self.unit_ids.size} - channels: {self.channel_ids.size} - ratio: {ratio:0.2f}"
sparsity = 1 - np.mean(self.mask)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here you rather have ratio = np.mean(self.mask) as it was before and just add the clarification that P(x=1), right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -119,6 +125,65 @@ def unit_id_to_channel_indices(self):
self._unit_id_to_channel_indices[unit_id] = channel_inds
return self._unit_id_to_channel_indices

def sparsify_waveforms(self, waveforms: np.ndarray, unit_index: int) -> np.ndarray:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some checks for dimensions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 172 to 173
waveforms : np.array
The sparsified waveforms array of shape (num_units, num_samples, num_active_channels).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is waveforms_or_template, right? It can be 3d (waveforms) or 2d (templates) (as in the tests). Maybe we should change the docstring and variable names accordingly. What do you think @h-mayorquin ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good suggesiton. I think on templates as central / canonical / representative forms of waveforms so I would like to keep the name of the variable waveforms as it is the general type. But I think that the docstring should clarify this and make it explicit.

Copy link
Collaborator Author

@h-mayorquin h-mayorquin Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to the docstring. Let me know what you think.

If you think that templates should be a variable name it would still prefer to have instead a method called sparsify_template and densify_template that take care of the 2 dimensional case and separate the functions. Maybe that's even better ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok for now! I agree that templates are "mean" waveforms. Thanks for clarifying the docstring!

@h-mayorquin h-mayorquin merged commit 468396a into SpikeInterface:main Sep 20, 2023
@h-mayorquin h-mayorquin deleted the sparsity_method_to_densify branch September 20, 2023 14:05
@@ -119,6 +125,85 @@ def unit_id_to_channel_indices(self):
self._unit_id_to_channel_indices[unit_id] = channel_inds
return self._unit_id_to_channel_indices

def sparsify_waveforms(self, waveforms: np.ndarray, unit_id: str) -> np.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit_id is not always a str can be int

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be usefull also to have unit_index entry somtimes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot to do this. In fact, mea culpa, I kind of new it when I did it but I ... wished that unit_ids were only string and intentionally forgot : P

I will push a small fix for this.

I would not like to do have a method with two different type of inputs, then 1/4 of the code is used to validate the input with if cases.


return sparsified_waveforms

def densify_waveforms(self, waveforms: np.ndarray, unit_id: str) -> np.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Comment on lines +187 to +191
assert_msg = (
"Waveforms do not seem to be be in the sparsity shape of this unit_id. The number of active channels is "
f"{len(non_zero_indices)} but the waveform has {waveforms.shape[-1]} active channels."
)
assert self.are_waveforms_sparse(waveforms=waveforms, unit_id=unit_id), assert_msg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very little detail as a genral partice: here you format the message even before you made the assert.
But if the assert pass there is no need to make the text formatting. And so putting the message after the assert ..., is better no ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do feel that waste of computation but I prefer this for readability. This should not have any really impact on performance anywhere.

As an aside to your aside, it is usually considered a bad practice to use asserts the way that we do here (but most people, doing this is very popular!) because you can dissable assertions in optimize code:

https://discuss.python.org/t/mismatch-between-asserts-semantics-and-how-its-used-o-oo-disable/29282?page=2

The recommended way to do this would be to do and if statement and then raise the appropiate exception but I think that most people does not like this? What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a science vs industry thing. Every piece of science code I've ever read uses assert instead of if-exception. I think with the assumption that scientists aren't going to shut off assertions to increase speed. And using the asserts does empower the user such that if they don't want the protections designed by the developer they can shut them off to get the speed boost. But just my two cents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants