A workflow to estimate telomere length from matched tumor normal whole genome sequencing data (BAMs)
A workflow to estimate telomere length from matched tumor-normal whole genome sequencing (WGS) data from 25 childhood acute lymphoblastic leukemia cases. WGS data were from Illumina NovaSeq 6000 Sequencing.
Two software (TelomereHunter and TelSeq) were applied to estimate telomere length from matched tumor normal WGS data (BAMs) (processed by GATK Data pre-processing for variant discovery pipeline). A correlation plot was generated to compare the results from the two software.
- paper
- software and documentation
- Telomere content was quantified using TelomereHunter using ten telomere variant repeats including TCAGGG, TGAGGG, TTGGGG, TTCGGG, TTTGGG, ATAGGG, CATGGG, CTAGGG, GTAGGG and TAAGGG.
- paper
- github
- TL wsa estimated in kb using TelSeq. 7 was used as the threshold for the number of TTAGGG/CCCTAA repeats in a read for the read to be considered telomeric.
- step 1: run TelomereHunter
- step 2: run TelSeq
- step 3: aggregate outputs from individual bam files
- step 4: clean aggregated tables and generate final outputs
- for telseq, to calculate TL for each sample, we need to take a weighted average of all the read groups within each sample: zd1/telseq#1
- results from TelomereHunter and TelSeq had a high correlation (an example correlation plot generated from step 4)