You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Test data set:
Here are my final phase stats (with the small test, i.e. fastq downloaded with --minSpotId 0 --maxSpotId 1000000):
# cat phased.XA.dedup.stats | head -8
total 1429506
total_unmapped 203628
total_single_sided_mapped 278449
total_mapped 947429
total_dups 60249
total_nodups 887180
cis 152576
trans 734604
# cat phased.XA.phase0.stats | head -8
total 3704
total_unmapped 0
total_single_sided_mapped 0
total_mapped 3704
total_dups 257
total_nodups 3447
cis 3304
trans 143
# cat phased.XA.phase1.stats | head -8
total 3102
total_unmapped 0
total_single_sided_mapped 0
total_mapped 3102
total_dups 279
total_nodups 2823
cis 2389
trans 434
# cat phased.XA.unphased.stats | head -8
total 1203742
total_unmapped 0
total_single_sided_mapped 264503
total_mapped 939239
total_dups 59712
total_nodups 879527
cis 125709
trans 753818
# cat phased.XA.trans-phase.stats | head -8
total 1384
total_unmapped 0
total_single_sided_mapped 0
total_mapped 1384
total_dups 1
total_nodups 1383
cis 985
trans 398
I don't get why I don't have the same stats than in your tutorial, but the ratio of cis in phase0 / cis total is about the same range, I have here 2.5% (3304/(125709+985+2389+3304)) of cis in [hase0, in your example you have 1% (535/(52294+177+394+535)).
When I run the same example with the full fastq files (stll with SRR8811373), here are my final stats:
# cat phased.XA.dedup.stats | head -35
total 6923823
total_unmapped 953932
total_single_sided_mapped 1401139
total_mapped 4568752
total_dups 816275
total_nodups 3752477
cis 3045355
trans 707122
# cat phased.XA.phase0.stats | head -20
total 17866
total_unmapped 0
total_single_sided_mapped 0
total_mapped 17866
total_dups 4639
total_nodups 13227
cis 12685
trans 542
# cat phased.XA.phase1.stats | head -20
total 14771
total_unmapped 0
total_single_sided_mapped 0
total_mapped 14771
total_dups 4201
total_nodups 10570
cis 9202
trans 1368
# cat phased.XA.trans-phase.stats | head -20
total 6848
total_unmapped 0
total_single_sided_mapped 0
total_mapped 6848
total_dups 27
total_nodups 6821
cis 4867
trans 1954
# cat phased.XA.unphased.stats | head -20
total 5857878
total_unmapped 0
total_single_sided_mapped 1328611
total_mapped 4529267
total_dups 807408
total_nodups 3721859
cis 2529823
trans 1192036
Here the ratio of cis in phase0 / cis total is 0.4% (12685/(2529823+4867+12685+9202))
Do you know why the ratio of contact mapped on phase0 haplotype drops when I use the full dataset? do you have the same? I was expecting this ratio to be the same independently of using a full dataset or a sample.
Then here the ratio of 1-1 over total (= phase0 in pairtools phase) is 6% (82362/1209391)
I guess numbers of aligned reads, detected pairs and cis-contact will vary depending on quality threshold applied and algorithm used, but I don't get why it is so different between HiCPro output and pairtools phase on the full fastq file.
Additional questions:
Can I combine pairtools phase with pairtools dedub by restriction fragment?
Can I combine pairtools phase with filterByCov?
Something like
pairtools parse -> pairtools phase -> pairtools sort -> pairtools dedup amplification duplicates -> pairtools restrict -> pairtools dedup by restriction fragment -> pairtools filterbycov -> pairtools select by phase
Thanks you for your help!
David
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1198-dirty
pairtools, version 1.0.2
The text was updated successfully, but these errors were encountered:
Hi Open2C team,
I'd like to use pairtools phase and I have some issues running it. I'd like to use it in a first time to reanalyse snHiC data from Collombet et al.,2020.
following this https://pairtools.readthedocs.io/en/latest/examples/pairtools_phase_walkthrough.html
1. Test data set:
Here are my final phase stats (with the small test, i.e. fastq downloaded with --minSpotId 0 --maxSpotId 1000000):
I don't get why I don't have the same stats than in your tutorial, but the ratio of cis in phase0 / cis total is about the same range, I have here 2.5% (3304/(125709+985+2389+3304)) of cis in [hase0, in your example you have 1% (535/(52294+177+394+535)).
When I run the same example with the full fastq files (stll with SRR8811373), here are my final stats:
Here the ratio of cis in phase0 / cis total is 0.4% (12685/(2529823+4867+12685+9202))
Do you know why the ratio of contact mapped on phase0 haplotype drops when I use the full dataset? do you have the same? I was expecting this ratio to be the same independently of using a full dataset or a sample.
2. Comparison with processed data:
So I compared with the number or ratio of pairs that they report in the original dataset (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE129029)
Then here the ratio of 1-1 over total (= phase0 in pairtools phase) is 6% (82362/1209391)
I guess numbers of aligned reads, detected pairs and cis-contact will vary depending on quality threshold applied and algorithm used, but I don't get why it is so different between HiCPro output and pairtools phase on the full fastq file.
Additional questions:
Can I combine pairtools phase with pairtools dedub by restriction fragment?
Can I combine pairtools phase with filterByCov?
Something like
pairtools parse -> pairtools phase -> pairtools sort -> pairtools dedup amplification duplicates -> pairtools restrict -> pairtools dedup by restriction fragment -> pairtools filterbycov -> pairtools select by phase
Thanks you for your help!
David
The text was updated successfully, but these errors were encountered: