Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "how to" section on how to load matlab data #2018

Merged
1 change: 1 addition & 0 deletions doc/how_to/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ How to guides
get_started
analyse_neuropixels
handle_drift
load_matalb_data
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
66 changes: 66 additions & 0 deletions doc/how_to/load_matalb_data.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
Exporting MATLAB Data to Binary & Loading in SpikeInterface
===========================================================

In this tutorial, we'll go through the process of exporting your data from MATLAB in a binary format and then loading it using SpikeInterface in Python. Let's break down the steps.

Exporting Data from MATLAB
--------------------------

First, ensure your data is structured correctly. The data matrix should be organized such that the first dimension corresponds to samples/time and the second dimension to channels.

.. code-block:: matlab
% Define the size of your data
num_samples = 1000;
num_channels = 384;
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
% Generate random data as an example
data = rand(num_samples, num_channels);
% Write the data to a binary file
fileID = fopen('your_data_as_a_binary.bin', 'wb');
fwrite(fileID, data, 'double');
fclose(fileID);
.. note::

In a real-world scenario, replace the random data generation with your actual data.

Loading Data in SpikeInterface
-----------------------------

This should produce a binary file called `your_data_as_a_binary.bin` in your current MATLAB directory.
You will need the complete path (i.e. its location on your computer) to load it in Python.

Once you have your data in a binary format, you can seamlessly load it into SpikeInterface using the following script:

.. code-block:: python
from spikeinterface.core.binaryrecordingextractor import BinaryRecordingExtractor
from pathlib import Path
# Define the path to your binary file
file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")
file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")
# or for Windows
# file_path = Path(r"c:\path\to\your\data\your_data_as_a_binary.bin")

I tend to recommend just warning windows users ahead of time so that they don't come with a bunch of path issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
What do you think of the "path/to/your/data/" visual trick to let them know that it is a path? In another package we have similar tutorials:

https://neuroconv.readthedocs.io/en/main/conversion_examples_gallery/recording/spikeglx.html

And we opted for something more concise there (just a variable in caps). I would like to hear if you have any ideas on how to make this clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the path to variable is super tricky. Because some users will actually put path/to/variable thinking that it is a hard coded path. And then adding in windows confusion with the opposite slashes. But as far as strategies I still think it is best despite being a bit longer and less concise.

Only issue with the concise variable style is that for other users they might not realize what's behind the variable. So that's why I prefer the path/to/variable style although it is not perfect. For Linux users I think they tend to be more comfortable with the path notation. But Windows has a right+click copy path option that will copy with x\y\z instead, so even though they could put in a Path(c:/my/data) they almost never do. And then since Windows has backslashes with escapes if we don't warn them about using the raw string instead they will get pathway errors due to the escaping and get frustrated.

In summary, I would leave it as you have it, but add in a the comment for windows users so when they copy the path from the computer they know that they likely need the r.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your input.

# Ensure the file exists
assert file_path.is_file()
# Specify the parameters of your recording
sampling_frequency = 30_000.0 # in Hz, adjust as per your matlab dataset
num_channels = 384 # adjust as per your matlab dataset
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
dtype = "float64"
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
# Load the data using SpikeInterface
recording = BinaryRecordingExtractor(file_path, sampling_frequency=sampling_frequency,
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved
num_channels=num_channels, dtype=dtype, gain_to_uV=1, offset_to_uV=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to warn them about gain_to_uV and offset_to_uV. Maybe just a comment saying that these are used to convert to the actual voltages in case their data came from a reader that returned "proprietary units" that haven't been converted yet?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alejoe91 @zm711
My own preference here would be to hide this complexity from matlab users who I presume are less likely to be experienced. I want to write docs on how to load binary data that will indluce details such as gains and the time_axis=1 point that @alejoe91 mentions below.

I am thinking that within the Daniele Procida
this falls squarely into the "how-to/goal-oriented" type of documentation. For that type, I don't want to burden the user with extra details that are not 100 % related to the task at hand. This also how I like my how-to guides to be. I don't want asides.

In fact, I was thinking on not having this arguments at all (that is, leaving them at the default values of None) but I was concerned that some methods of pre-processing or spike-sorting might not work without this. Is that correct, @alejoe91 ? Would it there be any downsides on having a recording without gains or offsets?

If there are no downsides, I would rather omit these and then mention at the end that there is a more complete how-to of read_binary somewhere else that they can go once is available.

How do you guys think about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the use case could be that the user have their data in int16. In that case, gains and offsets are needed to correctly convert to uV. I think it's an important concept to spend a couple of words on and it is related to the task!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a super goal oriented tutorial to get the job done. But in this case those two values are (at least in my opinion) important for the task at hand. As @alejoe91 said in the case the data was stored as an int16 and needs to converted into a voltage. I think at a minimum a comment saying that not all data are in the correct format for all sorters so gain_to_uV and offset_to_uV will give you this fine control and then a link to relevant documentation that explains how these options work. Although in this case I tend toward educating (or maybe you'd argue over-educating) the user rather than overwhelming the user.

Copy link
Collaborator Author

@h-mayorquin h-mayorquin Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gain_to_uv and offset_to_uv are an implementation details of data extraction. My fear is that most users will not be familiar with them and that they already will have the data in the right units (as they are using that in MATLAB probably to analyze). Introducing these concepts to new users -as I was once- is likely to raise more questions, confuse and derail. I don't think we can get away with just a few comments.

We can meet in the middle. I added a specific section at the end dealing with integer typed traces, this separate the information streams and does not get in the way of new users and gives us more space to introduce the necessary context for using gains and offsets. Could you check it @alejoe91 and @zm711 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that section. It fits with the design I like (hive off the extra info for those that need it, but give an off-ramp for those who don't). Thanks @h-mayorquin :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Do check the video linked above if you haven't before. I think is very good by the way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now bookmarked. I'll check it out this afternoon :)

# Verify the shape of your data
assert recording.get_traces().shape == (num_samples, num_channels)
Common Pitfalls & Tips
----------------------

1. **Data Shape**: Always ensure that your MATLAB data matrix's first dimension corresponds to samples/time and the second to channels.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can mention that if the data is in the other shape, one can use time_axis=1: https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/core/binaryrecordingextractor.py#L66

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this.

Thinking about this, it makes me feel that the order argument in the get_traces is kind of redundant:

https://github.com/catalystneuro/spikeinterface/blob/1ead6a33e658bf5a0365d21506a90dd9bd32e67c/src/spikeinterface/core/baserecording.py#L260-L261

What do you think?

2. **File Path**: Double-check the file path in Python to ensure you're pointing to the right directory.
3. **Data Type**: When moving data between MATLAB and Python, it's crucial to keep the data type consistent. In our example, we used `double` in MATLAB, which corresponds to `float64` in Python.
4. **Sampling Frequency**: Ensure you set the correct sampling frequency when loading data into SpikeInterface.
h-mayorquin marked this conversation as resolved.
Show resolved Hide resolved