Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastool does not properly parse SRA files #5

Open
tsackton opened this issue May 5, 2015 · 2 comments
Open

Fastool does not properly parse SRA files #5

tsackton opened this issue May 5, 2015 · 2 comments

Comments

@tsackton
Copy link

tsackton commented May 5, 2015

When processing SRA RNA-seq fastq files with Fastool as part of the Trinity package, Fastool appends a /H to the end of sequence ids which then causes errors downstream in Trinity.

Here are the first few lines of an SRA file: https://gist.github.com/tsackton/8c5508a4b60a1e33f6f2

When I run: fastool --to-fasta --illumina-trinity sra_test.fq > sra_test.1.fa , the output headers look like this:

SRR488565.1/H
SRR488565.2/H
SRR488565.3/H
SRR488565.4/H
SRR488565.5/H
SRR488565.6/H

If I remove everything after the first space in the sra example (with seqtk seq -C), the output is normal:

SRR488565.1
SRR488565.2
SRR488565.3
SRR488565.4
SRR488565.5
SRR488565.6

The /H files do not work with Trinity, while the normal files after seqtk seq -C processing do.

This is tested with the latest version of fastool, compiled on Centos 6 with gcc 4.8.2

@fstrozzi
Copy link
Owner

fstrozzi commented May 6, 2015

Hi,
this is due to the SRA file header, the --illumina-trinity option called by Trinity was meant to be used with Illumina FastQ files with their typical header. In this case a quick work around would be to run fastool alone first on the R1 and R2 dataset with the options:

fastool --append /1 --to-fasta SRA_1.fastq > SRA_1_fixed.fastq
fastool --append /2 --to-fasta SRA_2.fastq > SRA_2_fixed.fastq

And then start Trinity with the "fixed" files, this should work.

@nickxzshi
Copy link

very thanks for solving the problem which i am having

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants