-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CONSISTENT ERROR -FastaIndex: Record has inconsistent line lengths or line endings #8
Comments
Hi @desmodus1984, Seems like the sequence identifiers (headers) do not start with '>', which is probably a must for SeqAn to parse and index the FASTA file properly. Can you try adding '>' to the beginning of sequence identifiers and rerunning Apollo again? I would also check the encoding of your text file and some unexpected hidden characters that you may have in your line endings, which may be messing up with your FASTA file. You can potentially use Best, Can Firtina |
Hi Firtina,
I checked my files again and they seem to be fine:
sequence identifiers (headers) start with '>'
head reads1.fasta>V300066187L4C001R0010000000/1AATGTAAATACATTTTTGTATCCTACTGTTTATTGTACTCTTATTACAGGCATTTTCCACTTTGTTCTGCAGTCTGTATTTTAAAAAATGCTATATTATC>V300066187L4C001R0010000014/1TGAGAAAGGTTGTTTCCCCAGGTAGGAATTTTCCCCTGAAGTTAGGGAGGGGATAAAGCCCCTTAACTAAGTGCCAGGTGGGTAGTTAATCACTTTAACT>V300066187L4C001R0010000017/1CCTAGCCCCACACCAGACCCCCAGCCCAGAGTCCAGAGCTGGGAAAATAAGTTACTGTAACTTCTGGCTATAAAAACCAGCGGGAACTGTGGCTGACTGA>V300066187L4C001R0010000029/1AGGGAGCTTCAGGACAACATGAAACGAAGTAACATACGCATAATAGGGCTGCAAGAAGGACAAGAAGAACAGCAAGGATTAGAAAATCTATTTGAAGAAA>V300066187L4C001R0010000038/1CACAGTATTTAACATGAGAATTTTTCACGTGTCAGGATAGAAAAGTTTAAATCAGCTCAAGGTTGATGACGATATAGAGAAACAAGCACTATTCTTTTTA
head reads2.fasta>V300066187L4C001R0010000000/2GTGTCAGATGTGTTATATAGCTTGATTTTAACCATTTAACCAATACATACATGAAGATATATACCCCAAATATATGCCATTTGTGTCAAGTATACCTGAA>V300066187L4C001R0010000014/2ATCTGTATTTATACCAATTGATTTTAATCCTGTCAATTTCTATCGCAAAGGTTAGGGCGTTTCTTATCTCCATTCCAGGGAGTAAAGATTATGTAGCTTA>V300066187L4C001R0010000017/2AAAGCTGCGCCCAAAACTCCCACCCGGCTAGACAGTTCAGTTCCTCTCCATATGTCACTGGATTTCCCCAAAGCCACTACCTGGTGCTGGAGCTCACCGG>V300066187L4C001R0010000029/2GTTTCTGTTGAGAAATCGTTTGATAATCTGATGGGGGATCCTTTGTAGGTAACTCTCTGTTTCTCTCTTGCTGCCTTTAAGATTCTCTCTTTGTCTTGAA>V300066187L4C001R0010000038/2TCTCACACTGATATTTTTTTCTCTCTCTCCCCTTCTCTCTCTCTCTAAAATCAATAAACATACCTTTGGGTGAGGATAAACAGAATAGTGCTTGTTTCTC
And I think the encoding is fine.
I converted my fastq to fasta using bioawk
reads1.fasta: text/plain; charset=us-asciireads_1.fq: text/plain; charset=us-asciireads2.fasta: text/plain; charset=us-asciireads_2.fq: text/plain; charset=us-ascii
Any way to check for those "hidden characters". I have no idea how to do that. I do not expect bioawk to add hidden characters.
Best regards;
Juan Pablo Aguilar Cabezas
Ecology and Evolutionary Biology Ph.D. Candidate
Department of Biological Sciences
Ohio University, Athens OH
…________________________________
From: Can Firtina ***@***.***>
Sent: Monday, January 3, 2022 10:11 AM
To: CMU-SAFARI/Apollo ***@***.***>
Cc: Aguilar Cabezas, Juan Pablo ***@***.***>; Mention ***@***.***>
Subject: Re: [CMU-SAFARI/Apollo] CONSISTENT ERROR -FastaIndex: Record has inconsistent line lengths or line endings (Issue #8)
________________________________
NOTICE: This message was sent from outside Ohio University. Please use caution when clicking links or opening attachments in this message.
________________________________
Hi @desmodus1984<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdesmodus1984&data=04%7C01%7Cja569116%40ohio.edu%7C795a08dce6c84417d64a08d9cecb4cd1%7Cf3308007477c4a70888934611817c55a%7C0%7C0%7C637768194832612194%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Vf7KTX339XggkiRbk3OC1PgHKXIOp0XJu5f15Z2c0Vc%3D&reserved=0>,
Seems like the sequence identifiers (headers) do not start with '>', which is probably a must for SeqAn to parse and index the FASTA file properly. Can you try adding '>' to the beginning of sequence identifiers and rerunning Apollo again? I would also check the encoding of your text file and some unexpected hidden characters that you may have in your line endings, which may be messing up with your FASTA file.
You can potentially use seqtk seq to convert your FASTA file in a way that Apollo requires. It would hopefully resolve the issues that you may experience regarding formatting and line endings.
Best,
Can Firtina
—
Reply to this email directly, view it on GitHub<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCMU-SAFARI%2FApollo%2Fissues%2F8%23issuecomment-1004157543&data=04%7C01%7Cja569116%40ohio.edu%7C795a08dce6c84417d64a08d9cecb4cd1%7Cf3308007477c4a70888934611817c55a%7C0%7C0%7C637768194832612194%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=sp3v5wUyswcSCjsqgxseRUoFr7GEFnE5lQsbWjcC9Fo%3D&reserved=0>, or unsubscribe<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAJWD2VNFD5VNCFDQYIEEEPDUUG4BRANCNFSM5LEPBGAQ&data=04%7C01%7Cja569116%40ohio.edu%7C795a08dce6c84417d64a08d9cecb4cd1%7Cf3308007477c4a70888934611817c55a%7C0%7C0%7C637768194832612194%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zirnFb7kwDADHqJ%2B3iVkdDnU%2Fw7U8CKnrQFaNRLqCWA%3D&reserved=0>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi.
I built an assembly and I am trying to polish it with apollo.
I installed it as told, and followed all the steps.
I converted the fastq files into fasta one-liners
head reads2.fasta
I did convert the sam to bam and sorted it and indexed it
/users/PHS0338/jpac1984/appz/bwa-mem2-2.2.1_x64-linux/bwa-mem2 mem -t 48 Hapo -R '@RG\tID:PA113-1\tSM:bar\tPL:DNBSEQ' \ /fs/scratch/PHS0338/BGI-reads/reads_1.fq > PA113-1.sam /fs/scratch/PHS0338/appz/samtools-1.14/samtools view -hb -@ 48 PA113-1.sam > PA113-1.bam /fs/scratch/PHS0338/appz/samtools-1.14/samtools view -h -@ 48 -F4 PA113-1.bam | /fs/scratch/PHS0338/appz/samtools-1.14/samtools sort -@ 48 -m 3G -O bam -o PA113-1.sorted.bam /fs/scratch/PHS0338/appz/samtools-1.14/samtools index -@ 12 PA113-1.sorted.bam
And I get the same SeqAn error that I do not know how to fix it.
The log:
Assembly: /users/PHS0338/jpac1984/data/myse-hapog.fasta
Pair of a set of reads and their alignments:
/fs/scratch/PHS0338/BGI-reads/reads1.fasta, /fs/scratch/PHS0338/appz/sam-bams/PA113-1.sorted.bam
/fs/scratch/PHS0338/BGI-reads/reads2.fasta, /fs/scratch/PHS0338/appz/sam-bams/PA113-2.sorted.bam
Output file: myse-polished.fasta
Maximum consecutive insertions: 3
Maximum consecutive deletions: 10
Transition probability to match states: 0.85
Transition probability to insertion states: 0.1
Overall deletion transition probabilities from a state: 0.05
Deletion transition factor: 2.5
Emission probability of a matching character: 0.97
Emission probability of a substitution (i.e., mismatch) character: 0.01
Emission probability of an inserted character: 0.333333
Filter size: 100
Viterbi filter size: 5
Viterbi batch size: 5000
Read chunking size (0 for original length): 1000
Max thread: 48
terminate called after throwing an instance of 'seqan::ParseError'
what(): FastaIndex: Record has inconsistent line lengths or line endings
/var/spool/slurmd/job8594593/slurm_script: line 10: 45058 Aborted (core dumped) bin/apollo -a /users/PHS0338/jpac1984/data/myse-hapog.fasta -r /fs/scratch/PHS0338/BGI-reads/reads1.fasta -r /fs/scratch/PHS0338/BGI-reads/reads2.fasta -m /fs/scratch/PHS0338/appz/sam-bams/PA113-1.sorted.bam -m /fs/scratch/PHS0338/appz/sam-bams/PA113-2.sorted.bam -t 48 -o myse-polished.fasta
Any idea of why it is failing all the time?
I have all the input files are required and it fails.
The text was updated successfully, but these errors were encountered: