by Daniel Paysan (#), Adityanarayanan Radhakrishnan (#), G.V. Shivashankar (^) and Caroline Uhler (^)
The repository contains the code for the main methodology and analyses described in our paper:
The code has been developed on a system running Ubuntu 20.04. LTS using a Intel(R) Xeon(R) W-2255 CPU with 3.70GHz, 128GB RAM and a Nvidia RTX 4000 GPU with CUDA v.11.1.74 installed. However, the demo application of our pipeline described in the following only requires a Linux system with at least 10 GB storage and a internet connection and thus a significantly less powerful system to run.
To facilitate the use and testing of our pipeline, we have implemented a demo application that can be used to predict novel, unseen overexpression conditions from chromatin images and is easy to use with minimal software and storage requirements. In particular, our demo application runs (depending on the number of input images) in as little as 5 minutes and requires only roughly 10GB of storage.
When run, the demo application will perform all required to steps to run our pipeline, i.e. it will
- Install a minimal software environment containing the required python version 3.8.10 and a few additional python packages.
- Download the required data to run the inference demonstration of our pipeline.
- Preprocess the chromatin images provided by the user for which the pipeline should infer the perturbed gene.
- Obtain the image and consequently the gene perturbation embedding for the test condition by encoding the images using the pretrained convolutional neural network ensemble image encoder model.
- Link the gene perturbation embeddings of their corresponding regulatory gene embeddings by training the kernel regression model.
- Output the 10 genes most likely overexpressed (in decreasing order) in the cells in the user-provided input images.
A Linux system is required to run the demo.
To run the commands described in this guide, you need a bash shell.
To activate a bash shell after opening a terminal (e.g. via the short-cut Ctrl+Alt+T if you are running Ubuntu or by typing in terminal
in the application search of your system), type in
bash
Click here if you see the output: "command "bash" not found".
Please install bash
as described in the output of your system e.g. via
sudo apt-get update
sudo apt-get install bash
The package manager Anaconda
or miniconda
needs to be installed on your system.
To test if it is installed, open a terminal on your system and type in
conda
Click here if the command "conda" not found
If the command conda
was not found, Anaconda or Miniconda is not installed on your system.
Please open a new terminal on your system.
Then install miniconda via:
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
If you encounter any issues, please refer to the official installation guide which can be found here.
[!WARNING] You need to close the terminal and open a new one to complete the installation
Make sure conda is initialized appropriately in your shell via typing
bash
conda init bash
source ~/.bashrc
If the perequisites are satisfied, please clone this repository by running the following command in a new terminal.
git clone https://github.com/uhlerlab/image2reg.git
There are three versions of our Image2Reg demo application we have developed.
- Image2Reg for test inputs: This variant runs our demo with default parameters and example inputs to quickly verify its functionality.
- Image2Reg for user-provided inputs: This variant enables the application of our pipeline to user-provided input images.
- Image2Reg for reproducibility: This variant reproduces the results of the leave-one-target-out cross-validation for five selected perturbation conditions described in our paper.
Please click on the name of the version you would like to run and follow the instructions.
Note
We recommend to first run the variant of our Image2Reg pipeline using test inputs before running it with user-provided inputs.
In the enclosed table we summarize any error messages output by the demo if it is not used as intended and their meaning respectively how these can be resolved. If you encounter any other errors, please open an issue in this repository and we will extend the list accordingly.
Table: Common Errors and Solutions
Problem | Error Message(s) | Cause | Solution |
---|---|---|---|
Empty input directory | The directory test_data/UNKNOWN/ images/raw/plate is empty. | The demo requires the raw chromatin images for which the perturbed gene is supposed to be predicted to be located in the specified directory. | Please deposit the raw chromatin images in the directory test_data/UNKNOWN/images/raw/plate and restart the demo |
Empty nuclear mask directory | The directory test_data/UNKNOWN/ images/unet_masks/plate is empty. | The demo requires the nuclear segmentation masks corresponding to the input raw chromatin images (i.e. the images located in ``test_data/UNKNOWN/images/raw/plate) to be located in the specified directory. | Please deposit the segmentation mask images in the directory test_data/UNKNOWN/images/unet_masks/plate and restart the demo. |
Missing/Wrong segmentation mask | FileNotFoundError: [Errno 2] No such file or directory | The demo application requires for each raw chromatin image located in test_data/UNKNOWN/images/raw/plate a respective nuclear segmentation mask to be located in test_data/UNKNOWN/images/unet_masks/plate which has the same file name as the corresponding raw chromatin image and satifies the criteria described in the Perequisites section. The error message occurs if for any raw image the corresponding mask was not found. |
Please make sure that all mask images are deposited in the before mentioned directory and restart the demo |
Malformed mask image | `Cannot access <...>: No such file or directory or TypeError: Non-integer label_image types are ambiguous. | The provided mask images need to satisfy the following criteria: a) a nuclear mask image is single-channel (black-white) image of the same dimensions as the corresponding raw chromatin image and b) each pixel is assigned an integer value where the background is assigned the value 0 and all other pixels get the value equal to the unique integer ID of the nucleus for which they mark the respective mask. Such nuclear mask images are e.g. the output of the function ``skimage.measure.label | Please make sure you provide appropriate nuclear mask images in the ``test_data/UNKNOWN/images/unet_masks/plate directory and restart the demo. |
Missing directory or files | Cannot access <...>: No such file or directory. | This error is most likely caused due to an malformed test_data directory likely due to an incomplete extraction or download of the data when the demo is run for the first time. |
Please delete the test_data directory completely and restart the demo which will redownload the directory. Please make sure to not interrupt the download or extraction process but run the demo until it asks you to confirm that the input data has been deposited in the correct directories to avoid this error. |
Missing conda environment | Provided conda environment not found. | This error only occurs if the demo is run with the --environment argument and a non-existing conda environment is provided. |
Please make sure that the conda environment you provide exists on your system or simply run the demo without the --environment argument to safely install a new conda environment that contains all required software packages. |
Python module not found | ModuleNotFoundError: No module named ''. | This error occurs if the conda environment used to run the demo does not contain all the required python packages. If you have run the demo by specifying the environment via the --environment argument, please make sure that the provided conda environment contains all package listed in the file requirements/demo/requirements_demo.txt . If you ran the demo without the --environment the newly installed conda environment is ensured to contain all packages, if the installation was successful and conda was appropriately initiliazed as described in the Perequisites section. |
Please run conda init in the terminal. Next run pip cache purge to remove any potentially malformed cached python packages and then restart our demo without providing the --environment argument to perform a fresh install of the conda environment used to run our demo. |
No or just one nuclei is found | ValueError: Empty data passed with indices specified. or ValueError: Found array with 1 sample(s)[...] while aminimum of 2 is required. | Your provided input images were found to contain less than two nuclei. Please note that might be due to the used filter settings in our image preprocessing. | Please ensure that your input images contain at least two nuclei and the filters for the cell size and shape defined in the file config/demo/preprocessing/full_image_pipeline_new_target.yml are appropriate for the resolution and cell size of the images. Our choices are selected for the 20x images of U2OS cells from the Rohban et al. (2017) or the JUMP-CP data set. If your images/nuclei are of different resolution or size, you might want to adjust in particular the minimal/maximum nuclear area (min_area and max_area ), the maximal area of the bounding box (max_bbarea ), the maximum eccentricity (max_eccentricitiy ), minimal solidity (min_solidity ) and the minimal aspect ratio (min_aspect_ratio ). All quantities are given in terms of pixels. For larger images of higher resolution and/or larger cells increase e.g. the maximum values for the area and the area of the bounding box. |
September 6th, 2023.
We have expanded the demo to enable running our pipeline on image data provided by the user using the models pretrained on the imaging data from Rohban et al. (2017) to facilitate the adaption of our pipeline to new imaging data sets.
August 18th, 2023.
We have added a novel demonstration of our pipeline that can be easily run without the need of even previously installing the coding environment and/or downloading any data. The demo can be used to run our pipeline in the inference mode, i.e. we provide a pretrained version of the pipeline but show how given images of five selected OE conditions it predicts the corresponding target genes out-of-sample (no information regarding these were used to setup the pipeline as described in the paper).
August 2nd, 2023.
On July 17th 2023 the external hdbscan
package broke due to number of changes of the name resolution. As a consequence the installation of any version of the package including the version 0.8.27 used in our software package was no longer able to be installed, leading to our installation script to no longer be able to run completely (see here for more information). We have updated the requirements file of our package to install the hotfix implemented in version hdbscan v.0.8.33. While we could not have anticipated such an issue suddenly occuring, we apologize for the inconvenience this may have caused. We have tested the updated installation script but please let us know if you encounter any issue with the installation on your end and/or running our code.
If you would like to reproduce all results of the paper from scratch, please refer to this guide. Please note that this will require substantially larger computing resources and the described analyses can take over 1000 hours of computation time while generating roughly 2TB of data!
If you would like to reproduce the figures of our manuscript, please refer to this guide which also contains instruction to download all the data we have generated during all analyses from DOI-assigned data archive.
If you encounter any problems with setting up the software please open a respective issue in this repository and we will do our very best to assist you.
If you use the code provided in the directory please also reference our work as follows:
TO BE ADDED
If you use the our software with the sample input images and/or the data provided please make sure to also reference the the corresponding raw data resources which are described in the paper. as well as our work.