-
Notifications
You must be signed in to change notification settings - Fork 0
5. Example and Test data
To test the snHiC
we refer to a subset of Hi-C data generated and published by San Martin et al. (JCB, 2022). These data are available in GEO at the accession number GSE172099 and downloadable from SRA (PRJNA722011) with the following accession numbers:
Sample | Group | SRA number |
---|---|---|
DU145_rep1 | DU145 | SRR14239814 |
DU145_rep2 | DU145 | SRR14239815 |
PC3_rep1 | PC3 | SRR14239816 |
PC3_rep2 | PC3 | SRR14239817 |
However, to make the test faster, we provide a down-sampled data set (2x5 millions reads per sample) that can be downloaded, together with configuration file and sample table, at the dedicated google drive.
To run the snHiC
analyses:
- create a directory called
snHiC_test
containing another directory called00_fastq_test
:mkdir -p $HOME/snHiC_test/00_fastq_test
- download the fastq files in
00_fastq_test
and rename using the structure:<sample>_rep<X>_R<Y>.fastq.gz
(e.g., MDAPCA2a_rep1_R1.fastq.gz` - download the snHiC_test_data_config.yaml file and the metadata table (for grouped analyses) in
$HOME/snHiC_test
- change the path to the human genome file, in our case version
hg19
, in the config file - activate the conda environment:
conda activate snHiC
(to avoid wrong assignment of the lib path, deactivate -conda deactivate
- any previous loaded environment) - run the pipeline (add
-n
flag for a dry-run):
snakemake \
-s $HOME/snHiC/workflow/snHiC.snakefile \
--configfile $HOME/snHiC_test/snHiC_test_data_config.yaml \
--cores 10
The output of this analyses can be found at the following links: individual samples, grouped analyses.
Analyses have been performed on:
- Samples: 4
- Groups: 2 (2 samples per group)
- Resolutions: 10, 20, 50, 100, 1000 kb
- Cores provided: 10
- System: HPC (GNU/Linux, x86_64), 165-Ubuntu SMP Tue Apr 18 08:53:12 UTC 2023 (5.4.0-148-generic)
Using the homemade R-function benchmark_summary.R (available in snHiC/resources), we generated the following summary table using the tables generated by snHiC in the output folder benchmarks
:
Rule | N steps | Tot Running Time (min) | Tot Running Time (dd.hh.mm.ss) | Max physical mem (GB) | Max virtual mem (GB) | Average mean.load |
---|---|---|---|---|---|---|
A_fastQC_raw | 8 | 6.5 | 6M 32s | 0.2 | 3.3 | 77.7 |
B_multiQC_raw | 1 | 0.1 | 8s | 0.1 | 0.2 | 33.1 |
C_bwa_align | 8 | 30.9 | 30M 55s | 21.6 | 37.8 | 442.5 |
D_generate_restriction_file_and_get_chrSizes | 1 | 2.5 | 2M 30s | 1.4 | 1.6 | 90.2 |
E1_interaction_matrix_and_bam_generation_at_smallest_resolution | 4 | 64.8 | 1H 4M 49s | 44.7 | 48.6 | 116.4 |
E2_multiQC_report_for_HiC_matrices | 1 | 0.1 | 7s | 0.1 | 0.3 | 62.2 |
E3_merging_interaction_matrix_bins_for_all_resolutions | 16 | 7.8 | 7M 47s | 0.5 | 0.6 | 56.4 |
F1_matrices_normalization | 5 | 1.8 | 1M 47s | 1.3 | 1.3 | 71.9 |
F2_samples_correlation | 1 | 1.5 | 1M 29s | 1 | 1 | 87 |
G1_matrices_correction__diagnosticPlot_and_MAD | 20 | 3.3 | 3M 18s | 0.5 | 0.6 | 67.6 |
G2_matrices_correction__getting_threshold_values | 20 | 0.1 | 5s | 0 | 0 | 0 |
G3_matrices_correction__correction | 20 | 14.1 | 14M 7s | 0.7 | 0.8 | 81.6 |
H1_matrices_format_conversion__cool | 1 | 1.3 | 1M 15s | 0.5 | 0.6 | 57.5 |
H2_matrices_format_conversion__hicpro | 1 | 4.1 | 4M 3s | 0.3 | 0.4 | 91 |
I_call_TADs_HiCexplorer | 20 | 63.7 | 1H 3M 39s | 6 | 6.9 | 84.5 |
J_plotting_intraChr_distances | 5 | 5.3 | 5M 20s | 0.8 | 0.8 | 77.3 |
L1_sum_matrices_by_group | 1 | 1.8 | 1M 49s | 1.2 | 10.4 | 71.7 |
L2_merging_grouped_interaction_matrix_bins_for_all_resolutions | 8 | 4.4 | 4M 21s | 0.6 | 0.6 | 49.9 |
M_grouped_matrices_normalization | 5 | 1 | 1M 0s | 0.8 | 0.8 | 72.4 |
N1_summed_matrices_correction__diagnosticPlot_and_MAD | 10 | 1.7 | 1M 42s | 0.5 | 0.6 | 74.3 |
N2_summed_matrices_correction__getting_threshold_values | 10 | 0 | 0s | 0 | 0 | 0 |
N3_summed_matrices_correction__correction | 10 | 5.4 | 5M 21s | 0.8 | 0.9 | 77.6 |
N4_summed_matrices_correction__cool_conversion | 1 | 1 | 57s | 0.8 | 0.9 | 75.9 |
N5_summed_matrices_correction__hicpro_conversion | 1 | 3.6 | 3M 35s | 0.3 | 0.4 | 89.4 |
O_call_TADs_on_summed_matrices_HiCexplorer | 10 | 30.5 | 30M 27s | 5.7 | 6.8 | 81.9 |
P_detect_loops_singleSamples_HiCexplorer | 8 | 72.7 | 1H 12M 39s | 3.2 | 5.3 | 3.7 |
Q_detect_loops_groupedSamples_HiCexplorer | 4 | 35.9 | 35M 53s | 3.7 | 5.9 | 3.9 |
R1_detect_compartments_dcHiC_singleSamples__inputFile_all_vs_all | 1 | 0 | 0s | 0.1 | 9.3 | 0 |
R2_detect_compartments_dcHiC_singleSamples__call_compartments | 1 | 258.1 | 4H 18M 7s | 2.1 | 14.7 | 11.9 |
R3_detect_compartments_dcHiC_singleSamples__bedGraphToBigWig | 1 | 0.2 | 13s | 0.1 | 0.1 | 0.8 |
R4_detect_compartments_dcHiC_singleSamples__call_compartments_combos | 1 | 10 | 9M 57s | 0.5 | 3.1 | 286.9 |
R5_detect_compartments_dcHiC_singleSamples__bedGraphToBigWig_combos | 1 | 0.2 | 13s | 0.1 | 0.1 | 0.6 |
S1_detect_compartments_dcHiC_groupedSamples__inputFile_all_vs_all | 1 | 0 | 0s | 0.1 | 9.2 | 0 |
S2_detect_compartments_dcHiC_groupedSamples__call_compartments | 1 | 148.7 | 2H 28M 42s | 0.9 | 7.1 | 3.9 |
S3_detect_compartments_dcHiC_groupedSamples__bedGraphToBigWig | 1 | 0.1 | 8s | 0.1 | 0.1 | 0.8 |
S4_detect_compartments_dcHiC_groupedSamples__call_compartments_combos | 1 | 3.1 | 3M 3s | 0.5 | 3.1 | 135.7 |
S5_detect_compartments_dcHiC_groupedSamples__bedGraphToBigWig_combos | 1 | 0.1 | 8s | 0.1 | 0.1 | 0.5 |
T_differential_contacts_SELFISH_groupedSamples | 5 | 150 | 2H 29M 57s | 62.7 | 65.5 | 38.9 |
U1_stripe_detection_STRIPPEN_singleSamples | 4 | 430.6 | 7H 10M 33s | 1.1 | 7 | 196.4 |
U2_stripe_detection_STRIPPEN_groupedSamples | 2 | 172.8 | 2H 52M 49s | 1.3 | 7 | 229.9 |
SUMMARY | 221 | 1539.8 | 1d 1H 39M 48S | 62.7 | 65.5 | 75.1 |