5. Example and Test data

5.1 Test datasets

To test the snHiC we refer to a subset of Hi-C data generated and published by San Martin et al. (JCB, 2022). These data are available in GEO at the accession number GSE172099 and downloadable from SRA (PRJNA722011) with the following accession numbers:

Sample	Group	SRA number
DU145_rep1	DU145	SRR14239814
DU145_rep2	DU145	SRR14239815
PC3_rep1	PC3	SRR14239816
PC3_rep2	PC3	SRR14239817

However, to make the test faster, we provide a down-sampled data set (2x5 millions reads per sample) that can be downloaded, together with configuration file and sample table, at the dedicated google drive.

5.2 Running example

To run the snHiC analyses:

create a directory called snHiC_test containing another directory called 00_fastq_test: mkdir -p $HOME/snHiC_test/00_fastq_test
download the fastq files in 00_fastq_test and rename using the structure: <sample>_rep<X>_R<Y>.fastq.gz (e.g., MDAPCA2a_rep1_R1.fastq.gz`
download the snHiC_test_data_config.yaml file and the metadata table (for grouped analyses) in $HOME/snHiC_test
change the path to the human genome file, in our case version hg19, in the config file
activate the conda environment: conda activate snHiC (to avoid wrong assignment of the lib path, deactivate - conda deactivate - any previous loaded environment)
run the pipeline (add -n flag for a dry-run):

snakemake \
-s $HOME/snHiC/workflow/snHiC.snakefile \
--configfile $HOME/snHiC_test/snHiC_test_data_config.yaml \
--cores 10

The output of this analyses can be found at the following links: individual samples, grouped analyses.

5.3 Resources and performance

5.3.1 Data features and System specifics

Analyses have been performed on:

Samples: 4
Groups: 2 (2 samples per group)
Resolutions: 10, 20, 50, 100, 1000 kb
Cores provided: 10
System: HPC (GNU/Linux, x86_64), 165-Ubuntu SMP Tue Apr 18 08:53:12 UTC 2023 (5.4.0-148-generic)

5.3.2 Benchmark summary

Using the homemade R-function benchmark_summary.R (available in snHiC/resources), we generated the following summary table using the tables generated by snHiC in the output folder benchmarks:

Rule	N steps	Tot Running Time (min)	Tot Running Time (dd.hh.mm.ss)	Max physical mem (GB)	Max virtual mem (GB)	Average mean.load
A_fastQC_raw	8	6.5	6M 32s	0.2	3.3	77.7
B_multiQC_raw	1	0.1	8s	0.1	0.2	33.1
C_bwa_align	8	30.9	30M 55s	21.6	37.8	442.5
D_generate_restriction_file_and_get_chrSizes	1	2.5	2M 30s	1.4	1.6	90.2
E1_interaction_matrix_and_bam_generation_at_smallest_resolution	4	64.8	1H 4M 49s	44.7	48.6	116.4
E2_multiQC_report_for_HiC_matrices	1	0.1	7s	0.1	0.3	62.2
E3_merging_interaction_matrix_bins_for_all_resolutions	16	7.8	7M 47s	0.5	0.6	56.4
F1_matrices_normalization	5	1.8	1M 47s	1.3	1.3	71.9
F2_samples_correlation	1	1.5	1M 29s	1	1	87
G1_matrices_correction__diagnosticPlot_and_MAD	20	3.3	3M 18s	0.5	0.6	67.6
G2_matrices_correction__getting_threshold_values	20	0.1	5s	0	0	0
G3_matrices_correction__correction	20	14.1	14M 7s	0.7	0.8	81.6
H1_matrices_format_conversion__cool	1	1.3	1M 15s	0.5	0.6	57.5
H2_matrices_format_conversion__hicpro	1	4.1	4M 3s	0.3	0.4	91
I_call_TADs_HiCexplorer	20	63.7	1H 3M 39s	6	6.9	84.5
J_plotting_intraChr_distances	5	5.3	5M 20s	0.8	0.8	77.3
L1_sum_matrices_by_group	1	1.8	1M 49s	1.2	10.4	71.7
L2_merging_grouped_interaction_matrix_bins_for_all_resolutions	8	4.4	4M 21s	0.6	0.6	49.9
M_grouped_matrices_normalization	5	1	1M 0s	0.8	0.8	72.4
N1_summed_matrices_correction__diagnosticPlot_and_MAD	10	1.7	1M 42s	0.5	0.6	74.3
N2_summed_matrices_correction__getting_threshold_values	10	0	0s	0	0	0
N3_summed_matrices_correction__correction	10	5.4	5M 21s	0.8	0.9	77.6
N4_summed_matrices_correction__cool_conversion	1	1	57s	0.8	0.9	75.9
N5_summed_matrices_correction__hicpro_conversion	1	3.6	3M 35s	0.3	0.4	89.4
O_call_TADs_on_summed_matrices_HiCexplorer	10	30.5	30M 27s	5.7	6.8	81.9
P_detect_loops_singleSamples_HiCexplorer	8	72.7	1H 12M 39s	3.2	5.3	3.7
Q_detect_loops_groupedSamples_HiCexplorer	4	35.9	35M 53s	3.7	5.9	3.9
R1_detect_compartments_dcHiC_singleSamples__inputFile_all_vs_all	1	0	0s	0.1	9.3	0
R2_detect_compartments_dcHiC_singleSamples__call_compartments	1	258.1	4H 18M 7s	2.1	14.7	11.9
R3_detect_compartments_dcHiC_singleSamples__bedGraphToBigWig	1	0.2	13s	0.1	0.1	0.8
R4_detect_compartments_dcHiC_singleSamples__call_compartments_combos	1	10	9M 57s	0.5	3.1	286.9
R5_detect_compartments_dcHiC_singleSamples__bedGraphToBigWig_combos	1	0.2	13s	0.1	0.1	0.6
S1_detect_compartments_dcHiC_groupedSamples__inputFile_all_vs_all	1	0	0s	0.1	9.2	0
S2_detect_compartments_dcHiC_groupedSamples__call_compartments	1	148.7	2H 28M 42s	0.9	7.1	3.9
S3_detect_compartments_dcHiC_groupedSamples__bedGraphToBigWig	1	0.1	8s	0.1	0.1	0.8
S4_detect_compartments_dcHiC_groupedSamples__call_compartments_combos	1	3.1	3M 3s	0.5	3.1	135.7
S5_detect_compartments_dcHiC_groupedSamples__bedGraphToBigWig_combos	1	0.1	8s	0.1	0.1	0.5
T_differential_contacts_SELFISH_groupedSamples	5	150	2H 29M 57s	62.7	65.5	38.9
U1_stripe_detection_STRIPPEN_singleSamples	4	430.6	7H 10M 33s	1.1	7	196.4
U2_stripe_detection_STRIPPEN_groupedSamples	2	172.8	2H 52M 49s	1.3	7	229.9
SUMMARY	221	1539.8	1d 1H 39M 48S	62.7	65.5	75.1

Provide feedback

Saved searches