-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "how to" section on how to load matlab data #2018
Add "how to" section on how to load matlab data #2018
Conversation
Here is the link to the relevant section: https://spikeinterface--2018.org.readthedocs.build/en/2018/how_to/load_matalb_data.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to make sure people can transition. Matlab to python can feel like a pretty big jump, so giving some tips is great. Maybe link the numpy matlab to python doc at the bottom too?
https://numpy.org/doc/stable/user/numpy-for-matlab-users.html
doc/how_to/load_matalb_data.rst
Outdated
file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin") | |
file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin") | |
# or for Windows | |
# file_path = Path(r"c:\path\to\your\data\your_data_as_a_binary.bin") |
I tend to recommend just warning windows users ahead of time so that they don't come with a bunch of path issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
What do you think of the "path/to/your/data/" visual trick to let them know that it is a path? In another package we have similar tutorials:
https://neuroconv.readthedocs.io/en/main/conversion_examples_gallery/recording/spikeglx.html
And we opted for something more concise there (just a variable in caps). I would like to hear if you have any ideas on how to make this clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the path to variable is super tricky. Because some users will actually put path/to/variable
thinking that it is a hard coded path. And then adding in windows confusion with the opposite slashes. But as far as strategies I still think it is best despite being a bit longer and less concise.
Only issue with the concise variable style is that for other users they might not realize what's behind the variable. So that's why I prefer the path/to/variable
style although it is not perfect. For Linux users I think they tend to be more comfortable with the path notation. But Windows has a right+click copy path option that will copy with x\y\z
instead, so even though they could put in a Path(c:/my/data
) they almost never do. And then since Windows has backslashes with escapes if we don't warn them about using the raw string instead they will get pathway errors due to the escaping and get frustrated.
In summary, I would leave it as you have it, but add in a the comment for windows users so when they copy the path from the computer they know that they likely need the r.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your input.
doc/how_to/load_matalb_data.rst
Outdated
recording = BinaryRecordingExtractor(file_path, sampling_frequency=sampling_frequency, | ||
num_channels=num_channels, dtype=dtype, gain_to_uV=1, offset_to_uV=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need to warn them about gain_to_uV
and offset_to_uV
. Maybe just a comment saying that these are used to convert to the actual voltages in case their data came from a reader that returned "proprietary units" that haven't been converted yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alejoe91 @zm711
My own preference here would be to hide this complexity from matlab users who I presume are less likely to be experienced. I want to write docs on how to load binary data that will indluce details such as gains and the time_axis=1
point that @alejoe91 mentions below.
I am thinking that within the Daniele Procida
this falls squarely into the "how-to/goal-oriented" type of documentation. For that type, I don't want to burden the user with extra details that are not 100 % related to the task at hand. This also how I like my how-to guides to be. I don't want asides.
In fact, I was thinking on not having this arguments at all (that is, leaving them at the default values of None
) but I was concerned that some methods of pre-processing or spike-sorting might not work without this. Is that correct, @alejoe91 ? Would it there be any downsides on having a recording without gains or offsets?
If there are no downsides, I would rather omit these and then mention at the end that there is a more complete how-to of read_binary
somewhere else that they can go once is available.
How do you guys think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the use case could be that the user have their data in int16
. In that case, gains and offsets are needed to correctly convert to uV. I think it's an important concept to spend a couple of words on and it is related to the task!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of having a super goal oriented tutorial to get the job done. But in this case those two values are (at least in my opinion) important for the task at hand. As @alejoe91 said in the case the data was stored as an int16
and needs to converted into a voltage. I think at a minimum a comment saying that not all data are in the correct format for all sorters so gain_to_uV
and offset_to_uV
will give you this fine control and then a link to relevant documentation that explains how these options work. Although in this case I tend toward educating (or maybe you'd argue over-educating) the user rather than overwhelming the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gain_to_uv
and offset_to_uv
are an implementation details of data extraction. My fear is that most users will not be familiar with them and that they already will have the data in the right units (as they are using that in MATLAB probably to analyze). Introducing these concepts to new users -as I was once- is likely to raise more questions, confuse and derail. I don't think we can get away with just a few comments.
We can meet in the middle. I added a specific section at the end dealing with integer typed traces, this separate the information streams and does not get in the way of new users and gives us more space to introduce the necessary context for using gains and offsets. Could you check it @alejoe91 and @zm711 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that section. It fits with the design I like (hive off the extra info for those that need it, but give an off-ramp for those who don't). Thanks @h-mayorquin :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Do check the video linked above if you haven't before. I think is very good by the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now bookmarked. I'll check it out this afternoon :)
doc/how_to/load_matalb_data.rst
Outdated
Common Pitfalls & Tips | ||
---------------------- | ||
|
||
1. **Data Shape**: Always ensure that your MATLAB data matrix's first dimension corresponds to samples/time and the second to channels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can mention that if the data is in the other shape, one can use time_axis=1
: https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/core/binaryrecordingextractor.py#L66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this.
Thinking about this, it makes me feel that the order argument in the get_traces is kind of redundant:
What do you think?
Co-authored-by: Zach McKenzie <[email protected]>
Sorry just one more comment and then looks good to me. |
Thanks for all your feedback. No worries if you want to do more suggestions. I think your feedback has improved the PR a lot. |
I did not have time to read this yet. Sorry. In short slow notebook are done locally in |
Co-authored-by: Zach McKenzie <[email protected]>
@samuelgarcia |
I'll be honest this is my confusion in general. |
I agree. It's not a rule that all how-tos should be a python script first. Like in this case, there is no point whatsoever to make a python script |
I was not aware that the "how to" was mean to be that. Sam just explained that to me in a call. |
Well, wherever this ends up in the toctree it looks good to me! |
Same for me! @samuelgarcia wanna take a final look? |
Co-authored-by: Alessio Buccino <[email protected]>
We have a couple of users that came to ask us how to load data from matlab. I think that this is a good scenario for writing a how to so we can help new users without having to repeat ourselves.
I am hesistant to go all the way to make a specific extractor from matlab as @alejoe91 suggested because:
Because of aspects two I feel that is just easier to give users general instructions of how to transform their data in matlab to a general format. In this case, binary should work well and we should then be able to process the data lazily using the
BinaryRecordingExtractor
. This tutorial does that. If we eventually decide to move all the way to build an extractor for matlab, then we can deprecate this section, meanwhile I think it can be useful to offer support to users.