-
Notifications
You must be signed in to change notification settings - Fork 0
Neurophysiology Pipeline
git clone [email protected]:theunissenlab/songephys.git
cd songephys
-
mkdir data
(or symlink to data directory) -
virtualenv env -p python3
(Using python3.5 seems to be the most stable) -
source bin/activate
(activates virtualenv and sets path correctly) pip install -r requirements.txt
-
pip install statsmodels
(because of a bug has to be installed after) - You also will need the spike sorting code in solid-garbazo (suss). In your Code folder:
git clone [email protected]:theunissenlab/solid-garbanzo.git
. Them make a soft link to that folder from songephys (that is the way the PYTHONPATH is set up inactivate
). If you are in songephys you can type:ln -s ~/Code/solid-garbanzo solid-garbanzo
. Finally, make sure all the requirements are also installed. Look at the requirements.txt file.
After this, use the command source bin/activate
whenever you want to enter this python environment to run the pipeline. Note that if you are not using virtualenv (for example is you use conda environments), you should still run the sh commands in bin/activate that set the paths correctly.
Some comments from installing on MacOS:
- the module setup.py in pipeline conflicts with other setup.py from a Conda installation and, in my version, I changed it to be called file_setup.py. But that means that you also need to change the other python files in pipeline that
import setup
(change toimport file_setup
). - To use
cmake
in Catalina you will have to install Xcode. The Xcode for Catalina is 11.0 and it installs with the command line tools that gives you system libraries version 10.15 (MacOSX10.15.sdk). You should find this in /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/. You will need to install version 10.13 which you can get from https://github.com/phracker/MacOSX-SDKs/releases. Download that version put it in the same folder and create a soft link to it:ln -s MacOSX10.13.sdk MacOSX.sdk
After environment is set up and activated, run the following script to set up automated gdrive access needed for the data pipeline.
python scripts/setup_gdrive.py
Follow the instructions printed. Make sure you are creating the api access in the right gdrive account (probably your berkeley one that has access to the Intan Ephys Data
folder.
The gdrive folder Intan Ephys Data
must be accessible from the top level of your google drive!
To run these steps, you must be authenticated for API access through google drive
-
Create site directory in the subject's
sites
directory. By convention, we name a site by the name of the first session at that site.mkdir data/birds/GreYel3594M/sites/GreYel3594M___170919_094646
-
Make a file called
filelist
in the site directory you just created. The first line should indicate the "lab", for now used only to distinguish differences between data organization for data collected in Berkeley ("theunissen") or Seewiesen ("seewiesen"). Subsequent lines should include all the rhd file names and csv file names in any order. The rhd and csv files for all the sessions must be uploaded on Google DriveExample of
data/birds/GreYel3594M/sites/GreYel3594M___170919_094646/filelist
:theunissen GreYel3594M___170919_094646.rhd GreYel3594M___170919_111646.rhd GreYel3594M___170919_124646.rhd GreYel3594M_trialdata_20170919094951.csv GreYel3594M_trialdata_20170919102348.csv GreYel3594M_trialdata_20170919125344.csv GreYel3594M_trialdata_20170919140635.csv
-
Run the SiteConfig task with the following command, that will automatically generate a yaml configuration file for the site:
luigi --module file_setup SiteConfig \ --site-dir data/birds/GreYel3594M/sites/GreYel3594M___170919_094646 \ --local-scheduler
This should have created a yaml file in the site directory that defines the names of sessions, the pyoperant files associated with each session, and the start time of each session relative to the first one.
Example of
data/birds/GreYel3594M/sites/GreYel3594M___170919_094646/sessions.yaml
lab: theunissen sessions: - id: GreYel3594M___170919_094646 playbacks: - GreYel3594M_trialdata_20170919094951 - GreYel3594M_trialdata_20170919102348 t: 0.0 - id: GreYel3594M___170919_111646 t: 5400.0 - id: GreYel3594M___170919_124646 playbacks: - GreYel3594M_trialdata_20170919125344 - GreYel3594M_trialdata_20170919140635 t: 10800.0
-
From the top level of the songephys directory, run the pipeline.
luigi --module playback_categories RunSiteCategories \ --site-dir data/birds/GreYel3594M/sites/GreYel3594M___170919_094646 --local-scheduler
If you are running Mac OS X, you might have to install python.app and use
pythonw
as in.pythonw -m luigi --module playback_categories RunSiteCategories \ --site-dir data/birds/GreYel3594M/sites/GreYel3594M___170919_094646 --local-scheduler
This will download all files, and pre-process them up to the manual spike sorting point.
-
Do spike sorting! This involves cloning this project https://github.com/theunissenlab/solid-garbanzo and installing its requirements.txt file as well. Its probably best to do this in a separate python environment. You can then modify
solid-garbanzo/suss/gui/config.py
with the path to your spikes (e.g.data/birds/GreYel3594M/sites/GreYel3594M___170919_094646/sorted_spikes
) for easier access.For each of the files in
sorted_spikes
(there should be 16 of them), you can sort the spikes and then save the sorted pkl file indata/birds/GreYel3594M/sites/GreYel3594M___170919_094646/manually_curated
. Then repeat the luigi command in the previous step to finish the pipeline.
Often there may be errors in intermediate steps of the pipeline. It can be difficult to figure out what the problem is.
-
Check that the sessions.yaml file is correct. Some errors might include mistyped session names, time offsets not computed correctly.
-
If early in the pipeline, check that the rhd download was not interrupted and that the rhd to nix conversion was not interrupted. These steps can leave an incomplete file that looks done but has corrupted data.
Diagram of data pipeline. See below for detailed description of these functions.
This task locates and downloads the rhd file, pyoperant files, and stimulus wav files associated with the specified recording session.
- RHD file: e.g.
raw/GreYel3594M___170919_094646.rhd
- This is the raw data file
- One more more pyoperant files:
raw/___________.csv
- These files are generated by pyoperant and contain stimulus presentation times, trial numbers, etc. for a set of stimuli played
- Stimuli directories at
stimuli/*
containing all .wav files referenced by the pyoperant files
This task converts the rhd file into a neo block stored as a .nix file.
- NIX file: e.g.
segmentation/GreYel3594M___170919_094646.nix
- 1 segment spanning entire session
- 3 channel indices (2 channel mic, 1 digital pulse for stimulus playbacks, 16 channel electrode)
This task combines one or more pyoperant csvs into a single pandas dataframe. It also loads the digital signals of the neo block to determine which trials were present in the current session.
- Combined playback data in pickle file: e.g.
segmentation/playbacks.pkl
- pandas dataframe with each row corresponding to one trial played during the session.
This task can be split up into two separate tasks. The first thing it does is detecting periods of vocal activity using the microphone channels, which it combines with stimulus playback information to define times during which either a live or playback sound occured. The second part of the task involves simply splitting up the main segment of the neo block into several 5 second segments.
- Segments defined in dictionary: e.g.
segmentation/GreYel3594M___170919_094646-segments.npy
- numpy file containing dictionary with keys "live", "playback" and "consolidated". 'Live' contains all live sound periods, 'playback' contains all playback periods, and 'consolidated' contains non-overlapping live and playback periods by keeping only live periods that do not overlap with a playback
- Neo block split into 5s segments: e.g.
segmentation/GreYel3594M___170919_094646-segmented.pkl
- Up to 1080 segments (5s each) spanning entire session
- 3 channel indices (2 channel mic, 1 digital pulse for stimulus playbacks, 16 channel electrode)
Filter electrode signals using a method that attempts to remove correlations within and across electrodes to reduce motion artifacts and other correlated noise. It does this by attempting to model the amplitude at one electrode by signal on all other electrodes in a preceding time window.
- Filtered Neo block: e.g.
segmentation/GreYel3594M___170919_094646-filtered.pkl
- Up to 1080 segments (5s each) spanning entire session
- 19 channel indices (2 channel mic, 1 digital pulse for stimulus playbacks, 1 channel index for each electrode)
Threshold spikes with a channel and segment dependent threshold.
- Thresholded Neo block: e.g.
spikes/GreYel3594M___170919_094646-thresholded.pkl
- Up to 1080 segments (5s each) spanning entire session
- 19 channel indices (2 channel mic, 1 digital pulse for stimulus playbacks, 1 channel index per electrode with 1 spiketrain per segment)
Merges segment information for live and playbacks from multiple sessions into one file. Offsets the timestamps for each by the start time of each session relative to the first.
- Merged vocal period dict in numpy format: e.g.
vocal_periods.npy
Merges neo blocks with thresholded spikes from multiple sessions into one file. Offsets the timestamps for each by the start time of each session relative to the first.
- Merged Neo block: e.g.
thresholded.pkl
- Up to 1080 segments (5s each) spanning entire session
- 19 channel indices (2 channel mic, 1 digital pulse for stimulus playbacks, 1 channel index per electrode with 1 spiketrain per segment)
- One pickle file for each electrode channel: e.g.
spikes/spike_waveforms-e0.pkl
- Has a SpikeDataset object with keys for "times" and "waveforms"
Hierarchical spike sorting over time.
- One pickle file for each electrode channel: e.g.
sorted/sorted-e0.pkl
- Has a ClusterDataset object representing each cluster. Several of these need to be merged in the following manual curation step.
Manual step using a gui for merging and deleting clusters created in SortSpikes. Each "sorted" file is curated and then saved to the "manually_curated" directory.
-
Output of SortSpikes for each electrode are generated and saved in
/<bird_name>/sites/<site_name>/sort_results
and are namedsorted-e{8-24}.pkl
-
To sort, run the sorting gui
python -m suss.gui.app
-
Load a sorted dataset. Combine, split, and delete clusters.
-
When done, Right Click -> Tag them with any tags that make sense (e.g. Single Unit, Multiunit)
-
Save the file in a directory named
/<bird_name>/sites/<site_name>/manually_curated/
with the namescurated-e{8-24}.pkl
- One pickle file for each electrode channel: e.g.
manually_curated/curated-e0.pkl
- Has a ClusterDataset object representing each cluster.
Write the results of the above sorting steps into the neo block.
- Sorted Neo block: e.g.
sorted.pkl
- Up to 1080 segments (5s each) spanning entire session
- 19 channel indices (2 channel mic, 1 digital pulse for stimulus playbacks, 1 channel index per electrode with 1 spiketrain per unit per segment)
Resegments the data previously cut into 5s chunks spanning the recording period into the variable length segments corresponding to live and playback vocal periods.
- Segmented Neo block: e.g.
sorted-resegmented.pkl
- Variable number of segments depending on detected live and playback periods
- 19 channel indices (2 channel mic, 1 channel index per electrode with 1 spiketrain per unit per segment)
Things that would be nice:
- automatic link between rhd and csv
- could be done at recording time - have pyoperant see what rhd file is in progress during start and stop of playbacks and write it to a csv / yaml file.
- Making filtering fast and verifying filtering is working
- currently takes about 10 hours per full 90 minute file.
- better algorithms to link spike clusters
- would make the manual curation step easier / faster
- write end file to NIX.
- saving our files to nix is very slow, so they are currently still in pickle files which cant be lazy-loaded
- Vocal categorization and motif alignment
- some of this can be manually done now
Our neurophysiological data pipeline is organized by a sequence of Tasks (data transformations) defined in the songephys repository. We use Luigi, a pipeline framework for python that helps to organize Tasks, Targets (data files), and their dependency structure.
The primary things the pipeline does is process electrode data, join continuous recordings together (that were split into different files), and manage spike sorting.
Our data is recorded continuously but is broken up into separate files containing 90 minute chunks (sessions) containing raw electrode and microphone input. The majority of the ~12GB file size is due to 18 channels (16 electrode, 2 microphone) sampled at 30kHz for 90 minutes, stored as 32-bit floats (2.9 billion data-points). This data is all in the initial .rhd file for a session.
The file size grows as data is processed; many of the intermediate steps in the pipeline take steps to mitigate the size of individual files and the amount of data that must be loaded into memory at once time. The file sizes shrink toward the end of the pipeline when putative spikes have been extracted and raw electrode channels are dropped from the file.
Multiple sessions recorded at one site (one depth) continuously are merged together once putative spikes are extracted from the electrode data and the electrode trace can be dropped, forming a chunk of variable length (site). One site will be derived from data from multiple sessions concatenated and appropriately offset.
Calendars and scheduling
Lab funds and purchases
Advising, Social Justice, Sexual Harassment, and Real World Shit
* Support ResourcesGetting connected to the lab network
Data storage and access
Computing
Working Remotely
Other Services
Husbandry, who to call, recordkeeping
Bird care links
Pecking Test (NAF 125)
Field Station
Protocols, "how to"s, techniques, and recipes
Instructions for individual pieces of equipment
Instructions
Hardware, software, and techniques for ephys
Data Collection
Data Analysis
Old pages:
- Webcam Notes
- Arduino Notes
- Pecking test computer configuration
- Troubleshooting (pecking test)
- Pumpkin's Sound Card
- Preparing Stimuli
Pages in progress: