Long read compatibility by tweaking the pipeline #332

varun8476 · 2023-02-14T19:29:45Z

Hi,
Can we make ARIBA compatible with long reads by changing the mapping and assembly approach?
I am planning to do this as my masters thesis project. I am a bioinformatics student and my first hunch is to use minimap2 for mapping the reads to the cluster and using any long read assembler such as Flye or Miniasm for assembling the reads.
Any leads as to whether this approach is feasible or pointing out any research done related to this would be helpful.
Thanks in advance.

martinghunt · 2023-02-16T10:58:50Z

This hasn't been tried. At the time ARIBA was made, long read assemblies were too low quality (in particular, indel errors), which would have led to too many errors. ARIBA is made to be quite conservative, which is fine for Illumina but not for data with a higher error rate.

But I'm not sure it's worth trying because these days long reads and their assemblies are significantly better now (although I'd still be wary of indel errors). If it was me, I would assemble all the reads (using flye/unicycler/whatever works) and then use arbitamr for the amr predictions: https://github.com/MDU-PHL/abritamr

Sorry if that sounds too negative, but realistically I expect that would be the best method. Happy to be proven wrong! That said, if you really want to do it then this is what I can think of that will need changing, and there's probably more that I haven't thought of. Basically, there's a bunch of places where read pairs are assumed, and it'll be a fair bit of work to deal with going from paired to unpaired:

change the initial mapping to not assume read pairs and probably change the kmer, step and minimizer sizes. If you want to update minimap -> minimap2 then fine but either way there's the faffing with c++ code so the mapping will work on paired and unpaired
all the reads allocated to clusters are stored in a single tabix indexed file. As each cluster is run it retrieves the reads from that file. All this code will need editing to handle unpaired reads.
getting an assembly method (good combination of assembler and command line options) that reliably works. This could turn out to be a massive pain. Expect the unexpected where one assembler may work perfectly on one sample and not on another sample.
after assembly, it uses read pairs to make a scaffold graph and checks for nodes with >1 edge. Would need to either skip this completely or reimplement by looking for long reads joining contigs.
also after assembly it maps the reads back to get pileup info, so that needs changing for unpaired as well

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long read compatibility by tweaking the pipeline #332

Long read compatibility by tweaking the pipeline #332

varun8476 commented Feb 14, 2023

martinghunt commented Feb 16, 2023

Long read compatibility by tweaking the pipeline #332

Long read compatibility by tweaking the pipeline #332

Comments

varun8476 commented Feb 14, 2023

martinghunt commented Feb 16, 2023