Adding UMI-tools #49

pfeiferl · 2020-07-05T12:15:42Z

Hi all!
Firstly I would like to thank you for this awesome work. Secondly I have a request:
Please is it possible to add UMI-tools to environment (docker) to allow work with umi's?
Thank you for answer

ewels · 2020-11-07T19:50:40Z

I think this would be a great addition 👍🏻 If you have any details (kits used for example), that would be helpful.

ewels · 2020-11-09T11:33:58Z

QIAGEN kit details: https://resources.qiagenbioinformatics.com/manuals/biomedicalgenomicsanalysis/current/index.php?manual=Create_UMI_Reads_miRNA.html

Regarding our slightly weird sequencing setup at the NGI:

Qiagen smRNA seq kit contains UMI at the end of a 75bp read. We are sequencing 2x50bp, which does contain UMI data in read 2 but doesn't allow us to use the analysis tool from Qiagen.

pfeiferl · 2020-11-09T12:42:16Z

Hi, I am using that QIAGEN kit you found (btw. for few people would be handy that that common sequence is qiagen adapter sequence).
But using umi-tools in smrnaseq pipeline i found a tricky thing - using umi-tools extract and dedup and then on deduplicated reads mirtop tools, leads to loose a lot of information due mirtop deduplication process (and in some cases in loosing ALL reads because are somehow interpreted by mirtop as duplicated).
So I recomend using umi dedup on alignment against host reference genome and then calling featurecounts (unfortunatly not possible to do this on bam files obtained by alignment against mature and hairping due to missing position field in bam file).

pcantalupo · 2020-11-09T13:50:12Z

@pfeiferl to be clear for future readers of the thread, that is the 3' adapter sequence as described in the Qiagen miRNAseq manual on page 53 (07/2020 version) and specified on these lines in the pipeline. Please correct me if I'm wrong.

I'm currently working with a client who did 75bp sequencing so that the UMIs are in the sequence reads. I was going to try using UMI tools but haven't looked into it in depth. I'm a bit confused on how you specify the regex for extracting. Are you using the regular expression mode? Can you share your code for how you did the extract and dedup? Thank you

pfeiferl · 2020-11-09T14:15:51Z

@pcantalupo you are correct

And umi extract: Yes, I am using regex - firstly the raw fastq files must NOT have trimmed adapter, you are searching for umi after it.

umi_tools extract --stdin=in.fastq.gz --stdout=out.fastq.gz --extract-method=regex --bc-pattern='.+AACTGTAGGCACCATCAAT{s<=2}(? P<umi_1>.{12})(?P<discard_2>.*)'

.+ means lett all before this sequence (AACTGTAGGCACCATCAAT) for use
{s<=2} means you are allowing 2 mistakes in adapter sequence
(? P<umi_1>.{12}) umi (it will be in the header)
(?P<discard_2>.*)' you must discard the end of read (umi tools ordinary works with single cell datas, where first what comes is adapter, then umi, and read is after it, so it letting the 5 end if you will not tell otherwise)

Dedup just simply

umi_tools dedup --method=unique -I in.bam -S out.bam

ewels · 2020-11-10T13:19:05Z

Note that the main nf-core/rnaseq pipeline already supports UMI tools: https://github.com/nf-core/rnaseq/pull/435/files#diff-6401496ba455b9488ffa902a6e4d7732b2c60ff2d77c5c3ef96b28a7ac7d3b28R1023

The rnaseq pipeline has just moved to DSL2, meaning that this functionality has been ported to nf-core/modules: https://github.com/nf-core/modules/tree/master/software/umitools

Once the DSL2 stuff settles down we will want to start migrating all pipelines to DSL2. When we come to do this pipeline that'll mean that we can reuse the same modules to also run UMI tools in this pipeline.

If we don't want to wait that long, we could always copy over code from the above ☝🏻 (but if we're not in a rush then I think it'd be better to wait).

lpantano · 2022-01-18T16:44:20Z

I have looked into that. Meanwhile extracting seems compatible with the pipeline and dedup BAM files can be used for some tools, there are some parts that will need some custom dedup before aligning. If somebody have time for this, I can help on the guiding how to implementing. I don't see a lot of time for doing it by myself in the next month, but who knows!

Integrate the umi tools module already existing in nf-core into the smrnaseq pipeline. See Issue nf-core#49

CKComputomics · 2022-05-13T09:06:58Z

@lpantano I'm currently looking into this. The UMI extract part is pretty straightforward to implement, but what steps do you refer to when you talk about custom deduplicating before aligning? Is a deduce step for the mapped hairpin and mature bam files (+ possibly for the genome alignment) not sufficient?

lpantano · 2022-05-18T20:45:59Z

Hi, dedup on bamfiles is not going to help some tools. If dedups happen in the fastq files, then it is fine. The tools targeted to a better quantification on miRNA will do a 'collapsing' step at the fastq level, where each sequence that is the same will be reported once in the output file (normally a fasta file), having the times where that sequence appeared in the read name.

mirdeep, mirtop, mirtrace will do this. Any of them are working at the bam file levels in the same way than rnaseq. So botton line, if this could be done at fastq level then is fine, but if not, I don't think it will be that useful. The quantification from the bam file shouldn't be used for anything than just statistics about how many reads map to mature, hairpin or genome.

Happy to set up a call to talk more, it is a little confusing, just because the history of smrnaseq analysis. thanks!

CKComputomics · 2022-05-19T06:13:32Z

So adding an additional step in genome mode that maps all reads, dedups the bam and converts it back to fastq would be an option?
Setting up a call would be great.

ewels · 2022-05-19T07:25:05Z

Assuming that UMI sequences are carried into aligned BAM files in the read headers, it should be fine to do alignment+UMI based deduplication, no? I don't really see why it has to be raw sequence based only?

apeltzer · 2022-06-21T11:31:15Z

I think #164 adds exactly that now - so everyone could take a look at the feature and test it for inclusion 👍🏻

apeltzer · 2024-01-12T10:09:39Z

#303 adds UMI handling, please test this thourhgly!

ewels added the enhancement New feature or request label Nov 7, 2020

idot mentioned this issue Nov 4, 2021

[FEATURE] UMI Handling (extract, trimming, merging trimming reports) #114

Closed

CKComputomics pushed a commit to CKComputomics/smrnaseq that referenced this issue Apr 26, 2022

ADD UMI TOOLS

3134ba6

Integrate the umi tools module already existing in nf-core into the smrnaseq pipeline. See Issue nf-core#49

apeltzer mentioned this issue Jun 21, 2022

[Feat] Add UMI Handling to the pipeline #164

Merged

10 tasks

apeltzer closed this as completed Jan 12, 2024

genesandbones mentioned this issue Mar 6, 2024

umi qiaseq #324

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding UMI-tools #49

Adding UMI-tools #49

pfeiferl commented Jul 5, 2020

ewels commented Nov 7, 2020

ewels commented Nov 9, 2020 •

edited

Loading

pfeiferl commented Nov 9, 2020

pcantalupo commented Nov 9, 2020 •

edited by ewels

Loading

pfeiferl commented Nov 9, 2020 •

edited by ewels

Loading

ewels commented Nov 10, 2020

lpantano commented Jan 18, 2022

CKComputomics commented May 13, 2022

lpantano commented May 18, 2022

CKComputomics commented May 19, 2022

ewels commented May 19, 2022

apeltzer commented Jun 21, 2022

apeltzer commented Jan 12, 2024

Adding UMI-tools #49

Adding UMI-tools #49

Comments

pfeiferl commented Jul 5, 2020

ewels commented Nov 7, 2020

ewels commented Nov 9, 2020 • edited Loading

pfeiferl commented Nov 9, 2020

pcantalupo commented Nov 9, 2020 • edited by ewels Loading

pfeiferl commented Nov 9, 2020 • edited by ewels Loading

ewels commented Nov 10, 2020

lpantano commented Jan 18, 2022

CKComputomics commented May 13, 2022

lpantano commented May 18, 2022

CKComputomics commented May 19, 2022

ewels commented May 19, 2022

apeltzer commented Jun 21, 2022

apeltzer commented Jan 12, 2024

ewels commented Nov 9, 2020 •

edited

Loading

pcantalupo commented Nov 9, 2020 •

edited by ewels

Loading

pfeiferl commented Nov 9, 2020 •

edited by ewels

Loading