You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use CrossMap as my favourite liftover tool because of its versatility and lately I've tried to use it to do liftover on some bam files with a large reference genome. Largest chromosome being 2446554542 in length.
I got this error:
Traceback (most recent call last):
File "/home/WUR/nieuw133/miniconda3/envs/Crossmap/bin/CrossMap.py", line 281, in <module>
crossmap_bam_file(mapping = mapTree, chainfile = chain_file, infile = in_file, outfile_prefix = out_file, chrom_size = targetChromSizes, IS_size = args.insert_size, IS_std = args.insert_size_stdev, fold = args.insert_size_fold, addtag = args.add_tags, cstyle = args.cstyle)
File "/home/WUR/nieuw133/miniconda3/envs/Crossmap/lib/python3.10/site-packages/cmmodule/mapbam.py", line 406, in crossmap_bam_file
new_alignment.next_reference_start = read2_maps[1][1]
File "pysam/libcalignedsegment.pyx", line 1346, in pysam.libcalignedsegment.AlignedSegment.next_reference_start.__set__
OverflowError: value too large to convert to int32_t
Now searching pysam history it seems reading of .csi indexed bam files was implemented at some point, but writing was not?
In general, I am confused with the different limits present. The default .bai index has a max reference SQ length of 2^29, which can be omitted by using a .csi index for which I could not find a specified max. However, the largest chromosome I have will also overflow int32_t max value of 2,147,483,647. This also seems to go over the sam/bam limit specified. Many tools will be stuck with this limit because it would break backwards compatibility it seems? Is there a specific reason why not unsigned int32 was used?
What is in general the way to deal with such large genomes? Is there a standardized way to deal with it? Are there ongoing discussions about how to deal with this? Any references are much appreciated.
The text was updated successfully, but these errors were encountered:
Hi,
I use CrossMap as my favourite liftover tool because of its versatility and lately I've tried to use it to do liftover on some bam files with a large reference genome. Largest chromosome being 2446554542 in length.
I got this error:
Now searching pysam history it seems reading of .csi indexed bam files was implemented at some point, but writing was not?
In general, I am confused with the different limits present. The default .bai index has a max reference SQ length of 2^29, which can be omitted by using a .csi index for which I could not find a specified max. However, the largest chromosome I have will also overflow int32_t max value of 2,147,483,647. This also seems to go over the sam/bam limit specified. Many tools will be stuck with this limit because it would break backwards compatibility it seems? Is there a specific reason why not unsigned int32 was used?
What is in general the way to deal with such large genomes? Is there a standardized way to deal with it? Are there ongoing discussions about how to deal with this? Any references are much appreciated.
The text was updated successfully, but these errors were encountered: