This repository contains scripts for data processing and analysis used in the manuscript:
Wei W, Schon K, Elgar G, Orioli A, Tanguy M, Giess A, Tischkowitz M, Caulfield M, Chinnery PF. Nuclear-embedded mitochondrial DNA sequences in 66,083 human genomes. Nature 2022:in press.
- NUMTs and breakpoints detection
NUMTs_detection.sh
searchBreakpoint_fromblatoutputs.py
searchNumtCluster_fromDiscordantReads.py
groupNumtCluster_fromMultipleSamples.py
- enrichment analysis
enrichment_creatingRefgenome.py
enrichment_simulation.py
- mtDNA variants calling
mtVariantCalling.sh
mtVariantCalling_MToolBox.conf
- NUMTs methylation detection
nanopolish_methylationDetection.sh
- NUMT variants calling
VarDetection_fromDiscSplitReads.sh
generateVariantTable.Human.py
generateVariantTable.HumanChimp.py
- Circos plots
circos_allNUMTs.conf
confs
Whole genome sequence data that support the findings of this study can be analysed on the Genomics England data warehouse through https://www.genomicsengland.co.uk/understanding-genomics/data/