Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDEX_GENOME: Bowtie build error #340

Closed
AhmedMohamed1993 opened this issue Apr 8, 2024 · 5 comments
Closed

INDEX_GENOME: Bowtie build error #340

AhmedMohamed1993 opened this issue Apr 8, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@AhmedMohamed1993
Copy link

Description of the bug

Error in the genome index step using both 2.3.0 or dev versions using the command below.
All reference files are from mirbase and fasta from Ensembl.
The test run worked properly.
Any suggestions on what is causing the issue?

Command used and terminal output

$nextflow run nf-core/smrnaseq -r dev --input 'SampleSheet.csv' --outdir '/results' \
--mirtrace_species hsa --fasta 'Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz' \
--hairpin 'mature.fa' \
--mature 'hairpin.fa' \
--mirna_gtf 'hsa.gff3' \
--skip_mirdeep --protocol 'qiaseq' -profile singularity

Output:
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:INDEX_GENOME (Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz)'

Caused by:
  Process `NFCORE_SMRNASEQ:INDEX_GENOME (Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz)` terminated with an error exit status (1)

Command executed:

  # Remove any special base characters from reference genome FASTA file
  sed '/^[^>]/s/[^ATGCatgc]/N/g' Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > genome.edited.fa
  sed -i 's/ .*//' genome.edited.fa
  
  # Build bowtie index
  bowtie-build genome.edited.fa genome --threads 6
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SMRNASEQ:INDEX_GENOME":
      bowtie: $(echo $(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*$//')
  END_VERSIONS

Command exit status:
  1

Command output:
  Settings:
    Output files: "genome.*.ebwt"
    Line rate: 6 (line is 64 bytes)
    Lines per side: 1 (side is 64 bytes)
    Offset rate: 5 (one in 32)
    FTable chars: 10
    Strings: unpacked
    Max bucket size: default
    Max bucket size, sqrt multiplier: default
    Max bucket size, len divisor: 24
    Difference-cover sample period: 1024
    Endianness: little
    Actual local endianness: little
    Sanity checking: disabled
    Assertions: disabled
    Random seed: 0
    Sizeofs: void*:8, int:4, long:8, size_t:8
  Input files DNA, FASTA:
    genome.edited.fa
  Reading reference sizes
    Time reading reference sizes: 00:00:10
  Calculating joined length
  Writing header
  Reserving space for joined string
  Joining reference sequences
    Time to join reference sequences: 00:00:00
  Total time for call to driver() for forward index: 00:00:10

Command error:
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
  Warning: Encountered empty reference sequence
  Warning: Encountered reference sequence with only gaps
    Time reading reference sizes: 00:00:10
  Calculating joined length
  Writing header
  Reserving space for joined string
  Joining reference sequences
  Reference file does not seem to be a FASTA file
    Time to join reference sequences: 00:00:00
  Total time for call to driver() for forward index: 00:00:10
  Command: bowtie-build --wrapper basic-0 --threads 6 genome.edited.fa genome

Relevant files

No response

System information

No response

@AhmedMohamed1993 AhmedMohamed1993 added the bug Something isn't working label Apr 8, 2024
@christopher-mohr
Copy link
Contributor

Hi @AhmedMohamed1993, did you try with the extracted (not .gz) fasta file?

@AhmedMohamed1993
Copy link
Author

The extraction helped but stops at different point now.

ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT'

Caused by:
Process NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT terminated with an error exit status (1)

Command executed:

#Cleanup the GTF if mirbase html form is broken
GTF="hsa.gff3"
sed 's/>/>/g' $GTF | sed 's#
#\n#g' | sed 's#

##g' | sed 's#

##g' | sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' > ${GTF}_html_cleaned.gtf
mirtop gff --hairpin mature.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf -o mirtop --sps hsa ./bams/*
mirtop counts --hairpin mature.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf -o mirtop --sps hsa --add-extra --gff mirtop/mirtop.gff
mirtop export --format isomir --hairpin mature.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf --sps hsa -o mirtop mirtop/mirtop.gff
mirtop stats mirtop/mirtop.gff --out mirtop/stats
mv mirtop/stats/mirtop_stats.log mirtop/stats/full_mirtop_stats.log

cat <<-END_VERSIONS > versions.yml
"NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT":
mirtop: $(echo $(mirtop --version 2>&1) | sed 's/^.*mirtop //')
END_VERSIONS

Command exit status:
1

Command output:
['gff', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'hsa', './bams/27_post_seqcluster.bam', './bams/28_post_seqcluster.bam', './bams/29_post_seqcluster.bam']
['counts', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'hsa', '--add-extra', '--gff', 'mirtop/mirtop.gff']
['export', '--format', 'isomir', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '--sps', 'hsa', '-o', 'mirtop', 'mirtop/mirtop.gff']
['stats', 'mirtop/mirtop.gff', '--out', 'mirtop/stats']

Command error:
04/13/2024 04:04:15 INFO Filtered by being duplicated: 0
04/13/2024 04:04:15 INFO Filtered by being outside miRNA positions: 18784
04/13/2024 04:04:15 INFO Filtered by being low score: 0
04/13/2024 04:04:17 INFO It took 0.426 minutes
['gff', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'hsa', './bams/27_post_seqcluster.bam', './bams/28_post_seqcluster.bam', './bams/29_post_seqcluster.bam']
/usr/local/lib/python3.9/site-packages/mirtop/mirna/mintplates.py:512: SyntaxWarning: "is" with a literal. Did you mean "=="?
if prefix is '':
04/13/2024 04:04:20 INFO Run convert of GFF to TSV containing expression
04/13/2024 04:04:20 INFO INFO Reading GFF file mirtop/mirtop.gff
04/13/2024 04:04:20 INFO INFO Writing TSV file to directory mirtop
04/13/2024 04:04:20 INFO Missing Parents in hairpin file: 0
04/13/2024 04:04:20 INFO Missing MiRNAs in GFF file: 0
04/13/2024 04:04:20 INFO Non valid UID: 0
04/13/2024 04:04:20 INFO Output file is at mirtop/mirtop.tsv
04/13/2024 04:04:20 INFO It took 0.001 minutes
['counts', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'hsa', '--add-extra', '--gff', 'mirtop/mirtop.gff']
/usr/local/lib/python3.9/site-packages/mirtop/mirna/mintplates.py:512: SyntaxWarning: "is" with a literal. Did you mean "=="?
if prefix is '':
04/13/2024 04:04:22 INFO Run export of GFF into other format.
04/13/2024 04:04:22 INFO INFO Writing TSV file to directory mirtop
04/13/2024 04:04:22 INFO INFO Reading GFF file mirtop/mirtop.gff
04/13/2024 04:04:22 INFO Missing Parents in hairpin file: 0
04/13/2024 04:04:22 INFO Missing MiRNAs in GFF file: 0
04/13/2024 04:04:22 INFO Non valid UID: 0
04/13/2024 04:04:22 INFO Output file is at mirtop/mirtop_rawData.tsv
04/13/2024 04:04:22 INFO It took 0.001 minutes
['export', '--format', 'isomir', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '--sps', 'hsa', '-o', 'mirtop', 'mirtop/mirtop.gff']
/usr/local/lib/python3.9/site-packages/mirtop/mirna/mintplates.py:512: SyntaxWarning: "is" with a literal. Did you mean "=="?
if prefix is '':
04/13/2024 04:04:24 INFO Run stats.
04/13/2024 04:04:24 INFO Reading: mirtop/mirtop.gff
['stats', 'mirtop/mirtop.gff', '--out', 'mirtop/stats']
Traceback (most recent call last):
File "/usr/local/bin/mirtop", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/mirtop/command_line.py", line 34, in main
stats(kwargs["args"])
File "/usr/local/lib/python3.9/site-packages/mirtop/gff/stats.py", line 38, in stats
out.append(_calc_stats(fn))
File "/usr/local/lib/python3.9/site-packages/mirtop/gff/stats.py", line 82, in _calc_stats
df = _summary(lines)
File "/usr/local/lib/python3.9/site-packages/mirtop/gff/stats.py", line 130, in _summary
df_sum = _add_missing(df_sum)
File "/usr/local/lib/python3.9/site-packages/mirtop/gff/stats.py", line 110, in _add_missing
df2 = {'category': category, 'sample': df['sample'].iat[0], 'counts': 0}
File "/usr/local/lib/python3.9/site-packages/pandas/core/indexing.py", line 2221, in getitem
return self.obj._get_value(*key, takeable=self._takeable)
File "/usr/local/lib/python3.9/site-packages/pandas/core/series.py", line 1066, in _get_value
return self._values[label]
IndexError: index 0 is out of bounds for axis 0 with size 0

@christopher-mohr
Copy link
Contributor

Does it work if you do not specify --mirna_gtf hsa.gff3?

@christopher-mohr christopher-mohr added this to the 2.3.2 milestone Apr 18, 2024
@lpantano
Copy link
Contributor

I am happy to help with this, sorry I am late, starting to work on this pipeline more now.

If you still have access to the working directory where this error happens, I am happy to look at the files and see what is going on.Thanks!

@apeltzer apeltzer modified the milestones: 2.3.2, 2.4.0 Aug 8, 2024
@apeltzer
Copy link
Member

Please open a new issue if this still persists with dev. It should just work for -r dev if you pull the pipeline again. If thats not the case, let us know and open a new issue.

@github-project-automation github-project-automation bot moved this from Todo - Medium Priority to Done in smrnaseq Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

4 participants