From 5ea4fba77d1a7e2c0cea43ce15b9d3e5adb8e556 Mon Sep 17 00:00:00 2001 From: Arda Soylev Date: Thu, 24 Aug 2023 13:55:42 +0200 Subject: [PATCH] Update README.md --- README.md | 74 ++++++++++++++++++++++++++++++------------------------- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index 2ff4e9d..a4cd1bd 100644 --- a/README.md +++ b/README.md @@ -82,32 +82,6 @@ To test, simply type: sh ./test_tardis.sh -SONIC file (annotations container) -================================== - -SONIC files for some human and mouse genome reference versions are available at external repo: https://github.com/BilkentCompGen/sonic-prebuilt - - * human_g1k_v37.sonic: SONIC file for Human Reference Genome GRCh37 (1000 Genomes Project version) - * Also download the reference genome at: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz. - * ucsc_hg19.sonic: SONIC file for the human reference genome, UCSC version build hg19. - * Also download the reference genome at: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz. Deflate the tar archive and concatenate all chromosomes into a single FASTA file. - * ucsc_hg38.sonic: SONIC file for the human reference genome build 38. - * Also download the reference genome at: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFa.tar.gz. Deflate the tar archive and concatenate all chromosomes into a single FASTA file. - * mm9.sonic: SONIC file for the mouse reference genome version mm9. - * Also download the reference genome at: http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz. Deflate the tar archive and concatenate all chromosomes into a single FASTA file. - * mm10.sonic: SONIC file for the mouse reference genome version mm10. - * Also download the reference genome at: http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz. Deflate the tar archive and concatenate all chromosomes into a single FASTA file. - -Make sure that the same reference was used to align the reads beforehand (BAM file) and to create the SONIC file. The SONIC files and the reference FASTA files linked above are compatible. - -Building the SONIC file -======================= - -Please refer to the SONIC development repository: https://github.com/calkan/sonic/ - -The README.md file includes documentation on how to obtain the necessary files for different genomes from the UCSC Genome Browser. - - Running TARDIS =========================== @@ -135,10 +109,43 @@ There are three different ways of passing multiple input files to TARDIS: --out multiplesamples +SONIC file (annotations container) +================================== + +SONIC files for some human and mouse genome reference versions are available at external repo: https://github.com/BilkentCompGen/sonic-prebuilt + + * human_g1k_v37.sonic: SONIC file for Human Reference Genome GRCh37 (1000 Genomes Project version) + * Also download the reference genome at: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz. + * ucsc_hg19.sonic: SONIC file for the human reference genome, UCSC version build hg19. + * Also download the reference genome at: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz. Deflate the tar archive and concatenate all chromosomes into a single FASTA file. + * ucsc_hg38.sonic: SONIC file for the human reference genome build 38. + * Also download the reference genome at: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFa.tar.gz. Deflate the tar archive and concatenate all chromosomes into a single FASTA file. + * mm9.sonic: SONIC file for the mouse reference genome version mm9. + * Also download the reference genome at: http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz. Deflate the tar archive and concatenate all chromosomes into a single FASTA file. + * mm10.sonic: SONIC file for the mouse reference genome version mm10. + * Also download the reference genome at: http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz. Deflate the tar archive and concatenate all chromosomes into a single FASTA file. + +Make sure that the same reference was used to align the reads beforehand (BAM file) and to create the SONIC file. The SONIC files and the reference FASTA files linked above are compatible. + + +Building the SONIC file +======================= + +Please refer to the SONIC development repository: https://github.com/calkan/sonic/ + +The README.md file includes documentation on how to obtain the necessary files for different genomes from the UCSC Genome Browser. + +*Please note that you can also generate the SONIC file while running TARDIS:* + + tardis -i myinput.bam --ref GRCh38.fa --gaps hg38.gap.bed --reps hg38.repeats.out --dups hg38.dups.bed \ + --make-sonic my_sonic.sonic --out myoutput + + All parameters ============== - Basic Parameters: + Basic Parameters: + --bamlist [bamlist file] : A text file that lists input BAM files one file per line. --input/-i [BAM files] : Input files in sorted and indexed BAM format. You can pass multiple BAMs using multiple --input parameters. --out [output prefix] : Prefix for the output file names. @@ -146,22 +153,23 @@ All parameters --sonic [sonic file] : SONIC file that contains assembly annotations. --hist-only : Generate fragment size histograms only, then quit. - Advanced Parameters: + Advanced Parameters: + + --interdup : Run interspersed duplication clustering. --read-cluster [int] : # of clusters that a specific read can be involved in (Default is 20). --rp [int] : Minimum number of supporting read pairs in initial clustering (Default is 5). - --mei ["Alu:L1:SVA"] : List of mobile element names. + --mei [string ] : List of mobile element names separated by colon (Default is ["Alu:L1:SVA"]) --no-soft-clip : Skip soft clip remapping. --no-interdup : Skip interspersed duplication clustering. --no-mei : Skip mobile element insertion (MEI) clustering. --resolved : Output sequence resolved vcf calls. - --xa : Look for the alternative mapping locations in BWA. - --first-chr [chr_index] : Start running from a specific chromosome [0-based index in reference file] - --last-chr [chr_index] : Run up to a specific chromosome [0-based index in reference file] + --first-chr [int] : Start running from a specific chromosome [0-based index in reference file] + --last-chr [int] : Run up to a specific chromosome [0-based index in reference file] Additional parameters to build SONIC file within TARDIS: --make-sonic [sonic file] : SONIC file that will contain the assembly annotations. - --sonic-info [\"string\"] : SONIC information string to be used as the reference genome name. + --sonic-info [string] : SONIC information string to be used as the reference genome name, e.g., hg19. --gaps [gaps file] : Assembly gap coordinates in BED3 format. --dups [dups file] : Segmental duplication coordinates in BED3 format. --reps [reps file] : RepeatMasker annotation coordinates in RepeatMasker format. See manual for details.