Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with 3 and 5' clip using nextflex workflow #362

Closed
matbou85 opened this issue Jul 12, 2024 · 2 comments
Closed

Problem with 3 and 5' clip using nextflex workflow #362

matbou85 opened this issue Jul 12, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@matbou85
Copy link

matbou85 commented Jul 12, 2024

Description of the bug

I am running smRNAseqon an HPC. My samples were prepared using the Nextflex library. The run was successful, please look at the code below, but the 4 bases from the 3 and 5' seem not to have been clipped.

Command used and terminal output

nextflow run:

nextflow run /path/to/pipelines/smrnaseq-2.3.1/main.nf \
-profile docker \
--input /path/to/samplesheet.csv \
--outdir /path/to/output \
--protocol 'nextflex' \
--fasta /path/to/genomes/homo_sapiens/Gencode/41/GRCh38.primary_assembly.genome.fa \
--mirna_gtf /path/to/genomes/homo_sapiens/miRBase/hsa.gff3 \
--mature /path/to/genomes/homo_sapiens/miRBase/mature.fa \
--hairpin /path/to/genomes/homo_sapiens/miRBase/hairpin.fa \
--mirtrace_species hsa

Please see the first few lines of my original fastq file:

zcat ###_L2_S1_L001_R1_001.fastq.gz | head -100
@A00556:142:HG57WDRXY:1:2101:6126:1016 1:N:0:AAGACATA
CNACGACCTGTACTGACGCTCATGCGTGCTGGAATTCTCGGGTGCCAAGGA
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00556:142:HG57WDRXY:1:2101:11153:1016 1:N:0:AAGACATA
ANTATTCTTGACGACCATAGAGCATTGGAACCAGCTCTGGAATTCTCGGGT
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00556:142:HG57WDRXY:1:2101:14660:1016 1:N:0:AAGACATA
GNAAAGCAGGCACCGTGATTTATCTCACTGGAATTCTCGGGTGCCAAGGAA
+
F#F,:FFFFFF,,FFFFFFFF::FFFFF:FF,FFFFF:FFFFF:FF,FFFF
@A00556:142:HG57WDRXY:1:2101:14769:1016 1:N:0:AAGACATA
ANTGCGCGGGGTGGAGCAGCCTGTGCTGGAATTCTCGGGTGCCAAGGAACT
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF
@A00556:142:HG57WDRXY:1:2101:16052:1016 1:N:0:AAGACATA
CNCTTTCAAGTAATCCAGGATAGGCTCACTTGGAATTCTCGGGTGCCAAGG
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

I have compared the trim results from the file found in miRDeep2 folder :

head -20 ###_L2_S1_L001.fastp.fastq
@A00556:142:HG57WDRXY:1:2101:6126:1016 1:N:0:AAGACATA
CNACGACCTGTACTGACGCTCATGCGTGC
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00556:142:HG57WDRXY:1:2101:11153:1016 1:N:0:AAGACATA
ANTATTCTTGACGACCATAGAGCATTGGAACCAGCTC
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00556:142:HG57WDRXY:1:2101:14660:1016 1:N:0:AAGACATA
GNAAAGCAGGCACCGTGATTTATCTCAC
+
F#F,:FFFFFF,,FFFFFFFF::FFFFF
@A00556:142:HG57WDRXY:1:2101:14769:1016 1:N:0:AAGACATA
ANTGCGCGGGGTGGAGCAGCCTGTGC
+
F#FFFFFFFFFFFFFFFFFFFFFFFF
@A00556:142:HG57WDRXY:1:2101:16052:1016 1:N:0:AAGACATA
CNCTTTCAAGTAATCCAGGATAGGCTCACT
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFF

And the one found in output/mirna_quant/seqcluster/final:

zcat ###_L2_S1_L001.fastp_trimmed.fastq.gz | head -40
@seq_2_x1
CNACGACCTGTACTGACGCTCATGCGTGC
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFF
@seq_3_x1
ANTATTCTTGACGACCATAGAGCATTGGAACCAGCTC
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@seq_4_x1
GNAAAGCAGGCACCGTGATTTATCTCAC
+
F#F,:FFFFFF,,FFFFFFFF::FFFFF
@seq_5_x1
ANTGCGCGGGGTGGAGCAGCCTGTGC
+
F#FFFFFFFFFFFFFFFFFFFFFFFF
@seq_6_x1
CNCTTTCAAGTAATCCAGGATAGGCTCACT
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFF

Here are the results if I cut adapters and the 4 nucleotides in 3' and 5' using :

trim_galore --three_prime_clip_R1 4 --clip_R1 4 ###_L2_S1_L001_R1_001.fastq.gz
Cutadapt version: 4.9
trim_galore version: 0.6.7

zcat ###_L2_S1_L001_R1_001_trimmed.fq.gz | head -20
@A00556:142:HG57WDRXY:1:2101:6126:1016 1:N:0:AAGACATA
GACCTGTACTGACGCTCATGC
+
FFFFFFFFFFFFFFFFFFFFF
@A00556:142:HG57WDRXY:1:2101:11153:1016 1:N:0:AAGACATA
TTCTTGACGACCATAGAGCATTGGAACCA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00556:142:HG57WDRXY:1:2101:14660:1016 1:N:0:AAGACATA
AGCAGGCACCGTGATTTATC
+
:FFFFFF,,FFFFFFFF::F
@A00556:142:HG57WDRXY:1:2101:14769:1016 1:N:0:AAGACATA
CGCGGGGTGGAGCAGCCT
+
FFFFFFFFFFFFFFFFFF
@A00556:142:HG57WDRXY:1:2101:16052:1016 1:N:0:AAGACATA
TTCAAGTAATCCAGGATAGGCT
+
FFFFFFFFFFFFFFFFFFFFFF

The log file from fastp seems to run this command :

fastp --in1 ###-E12.fastq.gz --out1 ###-E12.fastp.fastq.gz --thread 6 --json ###-E12.fastp.json --html ###-E12.fastp.html --adapter_fasta known_adapters.fa -l 17 --max_len1 40

I mainly wonder if four nucleotides are cut from the 3' and 5' end of the RNA as these should be removed according to the Nextflex pipeline.
I have tested with an old version 1.1.0 and the 4 nucleotides from 3' and 5' were cut using the parameter --protocol 'nextflex'.



### Relevant files

_No response_

### System information

Nextflow version: 24.04.3.5916
Hardware: Desktop from Docker
OS: Linux Ubuntu
@matbou85 matbou85 added the bug Something isn't working label Jul 12, 2024
@lpantano
Copy link
Contributor

Follow up on this. Protocol doesn't work, you need to set up each parameter specifically.

But still, the clipping at 3p happens before adapter removal, so not useful. You could trim before using the pipeline and then skip trimming if you want to use correctly.

@apeltzer
Copy link
Member

apeltzer commented Jul 31, 2024

Should be fine now and addressed in #364 - please give dev a try 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants