-
Notifications
You must be signed in to change notification settings - Fork 50
patch
RagTag Version: v2.0.0
RagTag 'patch' uses one genome assembly to "patch" another genome assembly. We define two types of patches: Fills and Joins:
-
Fills are patches that fill assembly gaps. This process is like traditional gap-filling, though it uses an assembly instead of WGS sequencing reads.
-
Joins are patches that join distinct contigs. This is essentially scaffolding and gap-filling in a single step.
usage: ragtag.py patch <target.fa> <query.fa>
Homology-based continuous assembly scaffolding and gap-filling: Make continuous joins and fill gaps in 'target.fa' using sequences from 'query.fa'
positional arguments:
<target.fa> target fasta file (uncompressed or bgzipped)
<query.fa> query fasta file (uncompressed or bgzipped)
optional arguments:
-h, --help show this help message and exit
patching:
-e <exclude.txt> list of target sequences to ignore [null]
-j <skip.txt> list of query sequences to ignore [null]
-f INT minimum unique alignment length [1000]
--remove-small remove unique alignments shorter than '-f'
-q INT minimum mapq (NA for Nucmer alignments) [10]
-d INT maximum alignment merge distance [100000]
-s INT minimum merged alignment length [50000]
-i FLOAT maximum merged alignment distance from sequence terminus. fraction of the sequence length if < 1 [0.05]
--fill-only only fill existing target gaps. do not join target sequences
--join-only only join and patch target sequences. do not fill existing gaps
input/output options:
-o PATH output directory [./ragtag_output]
-w overwrite intermediate files
-u add suffix to unplaced sequence headers
mapping options:
-t INT number of minimap2/unimap threads [1]
--aligner PATH aligner executable ('nucmer' (recommended), 'unimap' or 'minimap2') [nucmer]
--mm2-params STR space delimited minimap2 parameters ['-x asm5']
--unimap-params STR space delimited unimap parameters ['-x asm5']
--nucmer-params STR space delimted nucmer parameters ['--maxmatch -l 100 -c 500']
RagTag 'patch' makes patches in '<target.fa>' using sequences from <query.fa>
. These files can be uncompressed or bgzipped. Use -e
to provide a single column file listing any <target.fa>
headers that should be ignored during patching (e.g. chr0/chrUn or alt contigs). Similarly, use -j
to provide a single column file listing any <query.fa>
headers that shall not be used for patching. If an alignment is not entirely unique, at least -f
bp of the alignment must be unique to be considered for scaffolding. By default, entirely unique alignments are considered regardless of their length, but this can be disabled with --remove-small
. Doing so ensures that only alignments at least -f
bp in length are considered for scaffolding. -q
sets the minimum Minimap2/Unimap mapq score for alignments. For each query sequence, syntenic alignments within -d
bp of each other are merged into longer alignments. After merging, alignments less than -s
bp long will be removed. Alignments must be within -i
bp of a target sequence terminus or gap to be considered for patching. With --fill-only
invoked, RagTag will only fill gaps, and with --join-only
invoked, RagTag will only make joins.
By default, RagTag places all of the output and intermediate files in a directory named ragtag_output
, but this can be changed with -o
. RagTag will not overwrite intermediate files that already exist in the output directory. This is to save time producing expensive alignment files. Users can set -w
to overwrite any preexisting files.
Use the -u
option to add the "_RagTag" suffix to each sequence in the scaffold output, even unplaced query sequences that have not changed. This ensures AGP compatibility with some external programs/databases. If one wants unplaced query sequences to retain their original header, do not use -u
.
Use -t
to set the number of threads Minimap2 or Unimap uses for mapping (overridden by --mm2-params
and --unimap-params
). This option does not apply to Nucmer alignments. Use the --aligner
option to specify the PATH of the appropriate aligner executable (Nucmer is default and recommended). The --mm2-params
, --unimap-params
, and --nucmer-params
options allow one to specify custom alignment parameters for Minimap2, Unimap, and Nucmer, respectively.
File | Description |
---|---|
ragtag.patch.agp |
The final AGP file defining how ragtag.patch.fasta is built |
ragtag.patch.asm.* |
Assembly alignment files |
ragtag.patch.comps.fasta |
The split target assembly and the renamed query assembly combined into one FASTA file. This file contains all components in ragtag.patch.agp
|
ragtag.patch.ctg.agp |
An AGP file defining how the target assembly was split at gaps |
ragtag.patch.ctg.fasta |
The target assembly split at gaps |
ragtag.patch.err |
Standard error logging for all external RagTag commands |
ragtag.patch.fasta |
The final FASTA file containing the patched assembly |
ragtag.patch.rename.agp |
An AGP file defining the new names for query sequences |
ragtag.patch.rename.fasta |
A FASTA file with the original query sequence, but with new names |
Are these docs confusing or incomplete? Please open an issue and let me know.