Tian 2024 Data Repository

This repository contains code and data structure for importing and processing the Tian 2024 single-cell RNA sequencing data. The experiments in Tian et al. involve two similar single-cell RNA sequencing (scRNA-seq) experiments conducted on Human PBMC (Peripheral Blood Mononuclear Cells) from 16 individuals.

Data from the two experiments are labeled as:

SRR31856734
SRR31856747

The pipeline includes downloading raw data, converting to FASTQ files, preprocessing with Cell Ranger, and organizing processed data for downstream analysis.

Directory Structure

After running the pipeline, the directory structure under the tian-2024 directory is organized as follows:

tian-2024
├── raw
│   ├── SRR31856734
│   │   ├── SRR31856734_S1_L001_R1_001.fastq
│   │   └── SRR31856734_S1_L001_R2_001.fastq
│   ├── SRR31856747
│   │   ├── SRR31856747_S1_L001_R1_001.fastq
│   │   └── SRR31856747_S1_L001_R2_001.fastq
│   └── refdata-gex-GRCh38-2020-A
├── processed
│   ├── SRR31856734
│   │   ├── outs
│   │   │   ├── metrics_summary.csv
│   │   │   ├── filtered_feature_bc_matrix.h5
│   │   │   ├── molecule_info.h5
│   │   │   └── other_files...
│   ├── SRR31856747
│   │   ├── outs
│   │   │   ├── metrics_summary.csv
│   │   │   ├── filtered_feature_bc_matrix.h5
│   │   │   ├── molecule_info.h5
│   │   │   └── other_files...

Pipeline Options

You can run the pipeline either in a single step or split into two steps:

Option 1: One-Step Execution

To run the entire pipeline (downloading and processing) in one step, use:

qsub -l m_mem_free=64G integrated_pipeline.sh

Option 2: Two-Step Execution

You can also run the pipeline in two separate steps:

Step 1: Download Data

qsub download_data.sh

Step 2: Process Data

After the data has been downloaded, process it with:

qsub -l m_mem_free=64G process_data.sh

How to Run

Clone the repository:

git clone [email protected]:Katsevich-Lab/import-tian-2024.git
cd import-tian-2024

Set up the environment: Ensure .research_config contains the following:
```
export LOCAL_TIAN_2024_DATA_DIR="/path/to/tian-2024/"
```
Submit the job: Use one of the pipeline options above to run the workflow.

Notes

Ensure all dependencies are installed and accessible in your $PATH:
- prefetch, fastq-dump, cellranger, and wget.
Sufficient memory (≥64GB recommended) and disk space are required to process the data.

Citation

If using this pipeline or data for your work, please cite:

Tian et al., 2024: DOI: 10.1038/s41588-024-02019-8

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
download_data.sh		download_data.sh
integrated_pipeline.sh		integrated_pipeline.sh
process_data.sh		process_data.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tian 2024 Data Repository

Directory Structure

Pipeline Options

Option 1: One-Step Execution

Option 2: Two-Step Execution

Step 1: Download Data

Step 2: Process Data

How to Run

Notes

Citation

END

About

Releases

Packages

Languages

Katsevich-Lab/import-tian-2024

Folders and files

Latest commit

History

Repository files navigation

Tian 2024 Data Repository

Directory Structure

Pipeline Options

Option 1: One-Step Execution

Option 2: Two-Step Execution

Step 1: Download Data

Step 2: Process Data

How to Run

Notes

Citation

END

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages