Error when sequence ID is too long #53

jcmckerral · 2021-06-28T02:36:32Z

There is a small issue where one of the biopython functions has a character length limit on sequence IDs, a more informative error message might be useful. A fasta ID

>SEQID_TOO_LONG_BIOPY_HAS_CHAR_LIMIT

results in a genbank file which will give a PhiSpy traceback/error

[USERID]$ PhiSpy.py testgenome.gb -o phispyTest
Traceback (most recent call last):
  File "$PATH/anaconda3/bin/PhiSpy.py", line 125, in <module>
    main(sys.argv)
  File "$PATH/anaconda3/bin/PhiSpy.py", line 48, in main
    args_parser.record = PhiSpyModules.SeqioFilter(filter(lambda x: len(x.seq) > args_parser.min_contig_size, SeqIO.parse(handle, "genbank")))
  File "$PATH/anaconda3/lib/python3.8/site-packages/PhiSpyModules/seqio_filter.py", line 33, in __init__
    for n, item in enumerate(content):
  File "$PATH/anaconda3/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 73, in __next__
    return next(self.records)
  File "$PATH/anaconda3/lib/python3.8/site-packages/Bio/GenBank/Scanner.py", line 516, in parse_records
    record = self.parse(handle, do_features)
  File "$PATH/anaconda3/lib/python3.8/site-packages/Bio/GenBank/Scanner.py", line 499, in parse
    if self.feed(handle, consumer, do_features):
  File "$PATH/anaconda3/lib/python3.8/site-packages/Bio/GenBank/Scanner.py", line 465, in feed
    self._feed_first_line(consumer, self.line)
  File "$PATH/anaconda3/lib/python3.8/site-packages/Bio/GenBank/Scanner.py", line 1572, in _feed_first_line
    raise ValueError("Did not recognise the LOCUS line layout:\n" + line)
ValueError: Did not recognise the LOCUS line layout:
LOCUS       SEQID_TOO_LONG_BIOPY_HAS_CHAR_LIMIT bp   DNA linear

Changing the ID to

>SEQID_SHORT

resolves the problem.

The text was updated successfully, but these errors were encountered:

liaochenlanruo · 2021-10-08T10:59:40Z

Traceback (most recent call last):
File "/home/liu/miniconda3/envs/component/bin/PhiSpy.py", line 10, in
sys.exit(run())
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/PhiSpyModules/main.py", line 122, in run
main(sys.argv)
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/PhiSpyModules/main.py", line 44, in main
args_parser.record = PhiSpyModules.SeqioFilter(filter(lambda x: len(x.seq) > args_parser.min_contig_size, SeqIO.parse(handle, "genbank")))
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/PhiSpyModules/seqio_filter.py", line 33, in init
for n, item in enumerate(content):
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 74, in next
return next(self.records)
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 516, in parse_records
record = self.parse(handle, do_features)
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 499, in parse
if self.feed(handle, consumer, do_features):
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 465, in feed
self._feed_first_line(consumer, self.line)
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 1571, in _feed_first_line
raise ValueError("Did not recognise the LOCUS line layout:\n" + line)
ValueError: Did not recognise the LOCUS line layout:
LOCUS NODE_52_length_15591_cov_14.37480715591 bp DNA linear

qianxin-kxy · 2023-05-10T06:23:15Z

I have also encountered this issue, but I have hundreds of gbk files to process, so is there any way to batch shorten the IDs in the files

ShanlinKe · 2023-05-13T22:28:09Z

I have also encountered this issue, but I have hundreds of gbk files to process, so is there any way to batch shorten the IDs in the files

I met the same issue. Any clues on this?

linsalrob · 2023-05-14T10:23:46Z

Can you point me to a file where this issue occurs so that I can fix it?

TSZUoE · 2023-06-27T09:14:56Z

Hi, I also had this issue. I initially tried to add the whitespace manually but that didn't work. My genbank files were annotated in PROKKA. Re-annotating using the --compliant flag for PROKKA fixed the issue for me as it parses the locus line in a different way.

ghost · 2024-08-26T21:41:27Z

@linsalrob @qianxin-kxy @jcmckerral thank you and the easy way would be to do this before running:

# this will remove all the spaces with the pipes
for i in *.fasta; do sed -i -e "s/ /|/g" ${i}; done 
# cut the pipe at the place you want
for i in *.fasta; do cut -f 1 -d "|" ${i}; done 
# all headers shorted. 
Thank you 
Gaurav

ghost · 2024-08-26T21:45:41Z

@ShanlinKe @TSZUoE see my response in this thread above.

if you have the C++ code, pointer declaration snippet, paste here, will do the convertible for the same

# this will remove all the spaces with the pipes
for i in *.fasta; do sed -i -e "s/ /|/g" ${i}; done 
# cut the pipe at the place you want
for i in *.fasta; do cut -f 1 -d "|" ${i}; done 
# all headers shorted.

Thank you
Gaurav

linsalrob added help wanted fix labels May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when sequence ID is too long #53

Error when sequence ID is too long #53

jcmckerral commented Jun 28, 2021

liaochenlanruo commented Oct 8, 2021

qianxin-kxy commented May 10, 2023

ShanlinKe commented May 13, 2023

linsalrob commented May 14, 2023

TSZUoE commented Jun 27, 2023

ghost commented Aug 26, 2024

ghost commented Aug 26, 2024 •

edited by ghost

Loading

Error when sequence ID is too long #53

Error when sequence ID is too long #53

Comments

jcmckerral commented Jun 28, 2021

liaochenlanruo commented Oct 8, 2021

qianxin-kxy commented May 10, 2023

ShanlinKe commented May 13, 2023

linsalrob commented May 14, 2023

TSZUoE commented Jun 27, 2023

ghost commented Aug 26, 2024

ghost commented Aug 26, 2024 • edited by ghost Loading

ghost commented Aug 26, 2024 •

edited by ghost

Loading