Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"AssertionError: String length does not match CIGAR" when running methratio.py #44

Open
grasscarp opened this issue Oct 10, 2022 · 0 comments

Comments

@grasscarp
Copy link

Hi, I got this error when I run methratio.py to parse bam file:

[methratio] @mon Oct 10 11:52:28 2022 Using 90% of available memory (866 MB) as limit
[methratio] @mon Oct 10 11:52:28 2022 Presorting inputs
[methratio] @mon Oct 10 11:52:29 2022 Calling samtools sort on SRR800059.uniq.bam and using 36 MB of memory
[bam_sort_core] merging from 792 files and 24 in-memory blocks...
[methratio] @mon Oct 10 12:04:41 2022 Processing 1 chromosomes at a time
[methratio] @mon Oct 10 12:04:43 2022 Reading CM002885.2 from SRR800059.uniq.tmpSrt.bam with samtools
Traceback (most recent call last):
File "methratio.py", line 588, in
main()
File "methratio.py", line 143, in main
ret = map(chromWorker, argList)
File "methratio.py", line 293, in chromWorker
options.rm_dup, options.trim_fillin, coverage, chromSize)
File "methratio.py", line 467, in get_sam_alignment
seq = parseCigar(seq, cigar)
File "methratio.py", line 457, in parseCigar
assert originalLen == index, "String length does not match CIGAR"
AssertionError: String length does not match CIGAR


So I checked the script and I find this function (line 432 - 458):

def parseCigar(seq, cigar):
'''
Delete insertions and dash "-" deletions
>>> parseCigar('ACTAGAATGGCT','3M1I3M1D5M')
'ACTGAA-TGGCT'
>>> parseCigar('ACTG','2M')
Traceback (most recent call last):
...
AssertionError: String length does not match CIGAR
'''
index = 0 # index in seq
originalLen = len(seq)
cigarMatch = cigarRE.findall(cigar)
for align in cigarMatch:
length = int(align[:-1])
op = align[-1]
if op == 'M':
index += length
elif op == 'I':
seq = seq[:index]+seq[index+length:]
elif op == 'D':
seq = seq[:index]+'-'*length+seq[index:]
index += length
else:
raise ValueError("%c not a valid CIGAR character"%(op))
assert originalLen == index, "String length does not match CIGAR"
return seq

Where I felt a little strange about "assert originalLen == index". In my understanding, "index" == "M" number + "D" number, rather than "M" number + "I" number (i.e. "originalLen"). So, "originalLen" won't equal to "index" unless the "I" number equal to the "D" number.
Is this a BUG, or have I misunderstood?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant