Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

device names in NWB broken #21

Open
bjhardcastle opened this issue Sep 21, 2024 · 12 comments
Open

device names in NWB broken #21

bjhardcastle opened this issue Sep 21, 2024 · 12 comments

Comments

@bjhardcastle
Copy link
Member

bjhardcastle commented Sep 21, 2024

@alejoe91

Up until this week, the units table included device names like this:

>>> s.nwb_zarr["units/device_name"][:]
array(['ProbeA', 'ProbeA', 'ProbeA', ..., 'ProbeF', 'ProbeF', 'ProbeF'],
      dtype=object)

Now they've changed to this:

>>> s.nwb_zarr["units/device_name"][:]
array(['Neuropixels 1.0', 'Neuropixels 1.0', 'Neuropixels 1.0', ...,
       'Neuropixels 1.0', 'Neuropixels 1.0', 'Neuropixels 1.0'],
      dtype=object)

...and the names in the devices table look like this:

>>> tuple(s.nwb_zarr["general/devices"].keys()) 
('Neuropixels 1.0', 'Neuropixels 1.0-1', 'Neuropixels 1.0-2', 'Neuropixels 1.0-3', 'Neuropixels 1.0-4', 'Neuropixels 1.0-5')

This broke some downstream processing for us (AllenInstitute/npc_sessions#127) and makes it really difficult to link units to probes. Could you please either fix the names of the devices or, if this was necessary, consider adding an electrode_group_name column to the units table.

Could you also please help us re-run the nwb-generation step of the pipeline for affected sessions?

@alejoe91
Copy link
Collaborator

Sure! It was definitely an unintended change! I'll fix it on Monday

@alejoe91
Copy link
Collaborator

@bjhardcastle I checked the latest asset (this one) ran through the new pipeline and it correctly used the probe name:

>>> nwbfile.units["device_name"][:]
array(['ProbeA', 'ProbeA', 'ProbeA', ..., 'ProbeF', 'ProbeF', 'ProbeF'],
      dtype=object)

Could you point me to an asset with the issue described above? So I can check its provenance. Anyways, running the problematic assets through the current pipeline should produce results with the correct device names!

@bjhardcastle
Copy link
Member Author

Weird, here's one from last week that has the issue: ecephys_644864_2023-02-01_00-00-00_sorted_2024-09-18_07-11-48

I also noticed this one from July, so maybe it's not due to a recent change: ecephys_668755_2023-08-28_13-06-40_sorted_2024-07-14_09-15-26

@alejoe91
Copy link
Collaborator

Thanks. So this is not related to the pipeline, but rather to the way that the probe is loaded by probeinterface at the time of compression + upload. I can get these two raw assets fixed and re-trigger the new pipeline on them.

@bjhardcastle
Copy link
Member Author

bjhardcastle commented Sep 23, 2024

So the raw data assets have a different organization because of this?

I don't think it's limited to these two - they're just the first two I pulled up today...
If I know what to look for in the raw data asset I can get a full list of affected sessions.

@bjhardcastle
Copy link
Member Author

@alejoe91 could you please let us know how to identify raw data assets with this issue? Thanks

@alejoe91
Copy link
Collaborator

yes, so you should traverse all sessions on s3, open a .zarr compressed folder with rec = si.read_zarr(...), and check the rec.get_annotation("probes_info"). This will be a list of 1 element with a dict name and probe_model. name should be ProbeA/B/....

If you give me a few days, I can take care of this. Not sure how urgent it is on your side though

@bjhardcastle
Copy link
Member Author

It's always somewhat urgent, but a few days is ok, and I can help.

I checked all the uploaded ecephys data and put together a table of paths that need updating, with correct probe names - working is shown in the notebooks here:
https://codeocean.allenneuraldynamics.org/capsule/4671738/tree

It's quite a lot that have the model name or None as their probe name (~1/3 of all probes). Seems like we should have noticed this affecting more than a couple of sessions... Do you sometimes fallback to parsing the path name instead?

@alejoe91
Copy link
Collaborator

@bjhardcastle thanks for compiling the liet. This is due to probeinterface and was fixe by this release. In fact I don't see any issues of this type after November 10th 2023, which is probably when the new probeinterface release was deployed to the transfer service.

What we can do is:

  • Fix the probe information of the zarr files on s3
  • Rerun the wrong sessions through the new pipeline

Let me know if that sounds ok

@bjhardcastle
Copy link
Member Author

  • fixing the names in zarr.attrs would be good. Since it touches the raw data and breaks provenance for existing assets we should run this by @dyf + others affected
  • re-running just the NWB step for all sessions would be good. If that's not possible, or if you meant re-run the whole pipeline: I wonder if it's necessary right now? I can re-run the few sessions I'm having issues with, but many don't seem to be causing issues for people. And if we switch the default sorter to ks4 in the near future, we might to re-run everything on that instead..

@alejoe91
Copy link
Collaborator

We have fixed probe info and other stuff in the zarr s3 a few times in the past :) Changes happen!

@bjhardcastle
Copy link
Member Author

Hi @alejoe91, there are currently 423 zarr files that need updating - would it be possible to fix them soon?

to_be_updated.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants