A flexible image segmentation pipeline for heterogneous multiplexed tissue images based on pixel classification implemente in Snakemake
The pipeline is based on CellProfiler
(http://cellprofiler.org/) for segmentation and Ilastik
(http://ilastik.org/) for
for pixel classification. It is streamlined by using the specially developed imctools
python package (https://github.com/BodenmillerGroup/imctools)
package as well as custom CellProfiler modules (https://github.com/BodenmillerGroup/ImcPluginsCP).
This pipeline was developed in the Bodenmiller laboratory of the University of Zurich (http://www.bodenmillerlab.org/) to segment hundereds of highly multiplexed imaging mass cytometry (IMC) images. However it also has been already been sucessfully applied to other multiplexed imaging modalities..
The PDF found describes the conceptual basis: 'Documentation/201709_imctools_guide.pdf'. While still conceputually valid the installation procedures described are outdated.
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).
To run the pipeline, the following software needs to be installed:
conda
: A reproducible package managersingularity
: A container platform- Documentation: https://sylabs.io/guides/3.5/user-guide/introduction.html
- Installation: https://sylabs.io/guides/3.5/admin-guide/installation.html
- Note: Installation via conda currently gives lots of issues and thus is not recommended.
Make sure this software packages work.
- Create a new github repository using this workflow as a template.
- Clone the newly created repository to your local system, into the place where you want to perform the data analysis.
- Initialize the ImcPluginsCP submodule:
git submodule update --init --recursive
Install Snakemake using conda:
conda create -c bioconda -c conda-forge -n snakemake snakemake
For installation details, see the instructions in the Snakemake documentation.
Configure the workflow according to your needs via editing the files in the config/
folder.
Adjust config.yaml
to configure the workflow execution.
The schema at workflow/schemas/config_pipeline.schema.yml
explains all the options.
To enable the 'compensation' workflow according to Chevrier et al 2018, need to configure either:
-
a spillover matrix: set path in configfile via
compensation/fn_spillover_matrix
-
or: a spillover acquisitions acquired according to: https://docs.google.com/document/d/195eViUqHoYRKrkoy_NkIdJPmyx1-OuDaSjiWQBy4weA/edit?usp=sharing set configfile: -
compensation/folder_spillover_slide_acs
-compensation/col_panel_spillover
-> column name of a boolean column in the 'panel.csv' indicating which metals were spotted for the spillover acquisition
Activate the conda environment:
conda activate snakemake
Test your configuration by performing a dry-run via
snakemake --use-conda -n --use-singularity
Execute the workflow locally via
snakemake --use-conda --cores $N --use-singularity
using $N
cores or run it in a cluster environment via
snakemake --use-conda --cluster qsub --jobs 100 --use-singularity
or
snakemake --use-conda --drmaa --jobs 100 --use-singularity
The Cellprofiler output will be in results/cpout
. All other folders should be considered
temporary output.
See section 'UZH slurm cluster' to get more details how to run this on the cluster of the University of Zurich
snakemake download_example_data --use-singularity --cores 32
snakemake get_untrained_ilastik --use-singularity --cores 32
This will generate random crops to train the Ilastik cell pixel classifier in results/ilastik_training_data
and produce an untrained classifier at: untrained.ilp
Open the classifier in Ilastik and save the trained classifier under the filename specified as:
fn_cell_classifier
in the configuration file.
To open the cellprofiler GUI at any step, run:
snakemake results/cp_{batchname}_open.sh --use-singularity --cores 32
replacing {batchname}
with the cellprofiler step name you want to inspect.
Eg. run
snakemake results/cp_segmasks_open.sh --use-singularity --cores 32
will open the segmentation step.
This will generate a script results/cp_segmasks_open.sh
that will open cellprofiler with all
paths, plugins and pipeline set as they would be when running the Snakemake workflow.
Note: this requires the cellprofiler
command to be installed and working.
First retrieve the github repository & install the conda environment as above.
Generate this file in the path ~/.config/snakemake/cluster_config.yml
__default__:
time: "00:15:00"
This defines the default batch run parameters.
Follow the instructions from:
https://github.com/Snakemake-Profiles/slurm
Use the following settings:
profile_name
: slurm
sbatch_defaults
:
cluster_config
: ../cluster_config.yml
advanced_argument_conversion
: 1 (Actually I have never tried this, might be worth a try)
To run the pipeline, the following modules are required and need to be loaded in this order:
module load generic
module load anaconda3
module load singularity
conda activate snakemake_imc
To run the snakemake command on the cluster, the following flags are needed:
--profile slurm
flag to specify the profile--use-singularity
to use singularity--singularity-args "\-u"
to use non-privileged singularity mode--jobs #
to have at most # number of concurrent jobs submitted (eg--jobs 50
)
After the example data has been downloaded (see above) the following command would run the full pipeline:
snakemake --profile slurm --use-singularity --singularity-args "\-u" --jobs 50