-
Notifications
You must be signed in to change notification settings - Fork 25
Emulating PrimerSearch using Bowtie2
The PrimerSearch
step in the diagnostic_primers
pipeline is very lengthy. To resolve this, we will be try to emulate the PrimerSearch
steps using Bowtie2
.
PrimerSearch
reads in primer pairs from an input file and searches them against DNA sequence(s) specified by the user.
Bowtie2
aligns sequencing reads to long reference sequences.
PrimerSearch
will only report "matches" if both primers in the pair have a match in opposite orientations.
-- we will need to address this in Bowtie2
to try and get similar/the same results.
Each primer pair in PrimerSearch
has a name, and 2 sequences (PrimerA and PrimerB). If PrimerA is aligned to the forward strand, PrimerB is then aligned to the reverse strand. The process is also reversed (PrimerB with the forward strand followed by PrimerA with the forward strand) to get all possible amplified regions.
#This is my primer file (name, PrimerA, PrimerB)
D1S243 cacacaggctcacatgcc gctccagcgtcatggact
D1S468 aattaaccgttttggtcct gcgacacacacttccc
D1S2845 ccaaagggtgcttctc gtggcattccaacctc
D1S1608 gatggcttttggggactatt cactgagccaagtgacacag
D1S2893 aaaacatcaactctcccctg ctcaaaccccaataagcctt
D1S2660 cacacatgcacatgcac agtgacaccagcaggg
primersearch -seqall (reference_file) -infile (primer_pairs_file) -mismatchpercent (integer, usually 10) -outfile (*.primersearch)
-
Aligns short reads to genomes (50 up to 100 or 1000bp).
-
Bowtie2
uses an FM index to index the genome which keeps the memory usage small (not as computer intensive asPrimerSearch
) -
Gives output as
SAM
format -
Bowtie2
uses local alignment--local
-- some nucleotides from one or both ends of the alignment can be trimmed/cut off to maximise the alignment score -
Bowtie 2
can use a paired-end alignment --Bowtie 2
attempts to find unpaired alignments for each sequence if the pairs do not align in a paired fashion. -
Bowtie 2
reports a spectrum of mapping qualities
#index the reference
bowtie2-build ref_genome.fa IndxRefGen
#Align unpaired reads
bowtie2 -x IndxRefGen -U reads_1.fq -S unpaired.sam
>
#Align paired end reads (file1 and file2)
bowtie2 -x IndxRefGen -1 reads_1.fq -2 reads_2.fq -S paired_end.sam
PrimerSearch
requires forward and reverse primers (PrimerA and PrimerB) in a file (as above) with a reference genome file
Bowtie2
requires a fasta file (usually read data) and a reference genome file
-
One way to address this is to align a sequence (forward primer) using
Bowtie2
to the ref genome, count 50-100bp upstream (whatever the desired length) then take note of the desired reverse sequence to construct "PrimerB" resulting in a primer set. -
Another way is to map all short sequences to the genome, then look in detail at which sequences are in ideal positions as "primer pairs" -- one forward, one reverse (looking at coverage of the ref genome)