mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp. This manual, describes how to choose the parameters and tune mrFAST with respect to the library settings. mrFAST is designed to find 'all' mappings for a given set of reads, however it can return one "best" map location if the relevant parameter is invoked.
NOTE: mrFAST is developed for Illumina, thus requires all reads to be at the same length. For paired-end reads, lengths of mates may be different from each other, but each "side" should have a uniform length.
Please refer to the wiki for details on how to build and use mrFAST.
mrFAST : Micro-Read Fast Alignment Search Tool. Enhanced with FastHASH.
Pretty simple. To fetch:
git clone https://github.com/BilkentCompGen/mrfast.git
To build:
cd mrfast
make
mrfast [options]
-v|--version Current Version.
-h Shows the help file.
--index [file] Generate an index from the specified fasta file.
--ws [int] Set window size for indexing (default:12 max:14).
--search [file] Search in the specified genome. Provide the path to the fasta file. Index file should be in the same directory.
--pe Search will be done in Paired-End mode.
--seq [file] Input sequences in fasta/fastq format [file]. If paired end reads are interleaved, use this option.
--seq1 [file] Input sequences in fasta/fastq format [file] (First file). Use this option to indicate the first file of paired end reads.
--seq2 [file] Input sequences in fasta/fastq format [file] (Second file). Use this option to indicate the second file of paired end reads.
-o [file] Output of the mapped sequences. The default is "output".
-u [file] Save unmapped sequences in fasta/fastq format.
--best Only the best mapping from all the possible mapping is returned.
--seqcomp Indicates that the input sequences are compressed (gz).
--outcomp Indicates that output file should be compressed (gz).
-e [int] Maximum allowed edit distance (default 4% of the read length).
--min [int] Min distance allowed between a pair of end sequences.
--max [int] Max distance allowed between a pair of end sequences.
--maxoea [int] Max number of One End Anchored (OEA) returned for each read pair. We recommend 100 or above for NovelSeq use. Default = 100.
--maxdis [int] Max number of discordant map locations returned for each read pair. We recommend 300 or above for VariationHunter use. Default = 300.
--crop [int] Trim the reads to the given length.
--sample [string] Sample name to be added to the SAM header (optional).
--rg [string] Read group ID to be added to the SAM header (optional).
--lib [string] Library name to be added to the SAM header (optional).
To build a Docker image:
cd docker
docker build . -t mrfast:latest
Your image named "mrfast" should be ready. You can run tardis using this image by
docker run --user=$UID -v /path/to/inputs:/input -v /path/to/outputdir:/output mrfast [args]
[args]
are usual arguments you would pass to tardis executable. Be careful about mapping. You need to specify folders respective to container directory structure.- You need to map host machine input and output directory to responding volume directories inside the container. These options are specified by '-v' argument.
- Docker works with root user by default. "--user" option saves your outputs.
Sample Docker-based command line assuming the working directory is /home/mrfast/samplerun:
docker run --user=$UID -v /home/mrfast/samplerun/:/input -v /home/mrfast/samplerun:/output mrfast --search /input/human_g1k_v37.fasta --seq1 /input/f1.fastq --seq2 /input/f2.fastq --pe --min 0 --max 1000 -o /output/test.sam