Skip to content

Emulating PrimerSearch using Bowtie2

Peter Cock edited this page Sep 30, 2021 · 7 revisions

Why emulate PrimerSearch?

The PrimerSearch step in the diagnostic_primers pipeline is very lengthy. To resolve this, we will be try to emulate the PrimerSearch steps using Bowtie2.


PrimerSearch and Bowtie2 comparisons

PrimerSearch reads in primer pairs from an input file and searches them against DNA sequence(s) specified by the user.

Bowtie2 aligns sequencing reads to long reference sequences.


PrimerSearch notes

PrimerSearch will only report "matches" if both primers in the pair have a match in opposite orientations. -- we will need to address this in Bowtie2 to try and get similar/the same results.

Each primer pair in PrimerSearch has a name, and 2 sequences (PrimerA and PrimerB). If PrimerA is aligned to the forward strand, PrimerB is then aligned to the reverse strand. The process is also reversed (PrimerB with the forward strand followed by PrimerA with the forward strand) to get all possible amplified regions.

#This is my primer file (name, PrimerA, PrimerB)
D1S243  cacacaggctcacatgcc      gctccagcgtcatggact
D1S468 aattaaccgttttggtcct     gcgacacacacttccc 
D1S2845 ccaaagggtgcttctc        gtggcattccaacctc
D1S1608 gatggcttttggggactatt    cactgagccaagtgacacag
D1S2893 aaaacatcaactctcccctg    ctcaaaccccaataagcctt
D1S2660 cacacatgcacatgcac       agtgacaccagcaggg

Example run

primersearch -seqall (reference_file) -infile (primer_pairs_file) -mismatchpercent (integer, usually 10) -outfile (*.primersearch)


Bowtie2 notes

  • Aligns short reads to genomes (50 up to 100 or 1000bp).

  • Bowtie2 uses an FM index to index the genome which keeps the memory usage small (not as computer intensive as PrimerSearch)

  • Gives output as SAM format

  • Bowtie2 uses local alignment --local -- some nucleotides from one or both ends of the alignment can be trimmed/cut off to maximise the alignment score

  • Bowtie 2 can use a paired-end alignment -- Bowtie 2 attempts to find unpaired alignments for each sequence if the pairs do not align in a paired fashion.

  • Bowtie 2 reports a spectrum of mapping qualities

Example run

#index the reference bowtie2-build ref_genome.fa IndxRefGen

#Align unpaired reads bowtie2 -x IndxRefGen -U reads_1.fq -S unpaired.sam>

#Align paired end reads (file1 and file2) bowtie2 -x IndxRefGen -1 reads_1.fq -2 reads_2.fq -S paired_end.sam


Adjustments to emulate PrimerSearch

PrimerSearch requires forward and reverse primers (PrimerA and PrimerB) in a file (as above) with a reference genome file

Bowtie2 requires a fasta file (usually read data) and a reference genome file

  • One way to address this is to align a sequence (forward primer) using Bowtie2 to the ref genome, count 50-100bp upstream (whatever the desired length) then take note of the desired reverse sequence to construct "PrimerB" resulting in a primer set.

  • Another way is to map all short sequences to the genome, then look in detail at which sequences are in ideal positions as "primer pairs" -- one forward, one reverse (looking at coverage of the ref genome)