You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we shuffle on chunk_ID, which makes read mappings different for different number of threads, or if reads are occurring in different chunk.
IIRC, BWA-MEM gets the pseudo random placement from the read name. Is it possible to do this instead of on chunk ID, without noticeable computational overhead? I don't think it's worth implementing if the code becomes complex or if it increases runtime.
I noticed this when running an experiment b/t symmetric and asymmetric seeds with reads simulated from either chr X or Y and mapping to only chr X and chr Y from CHM13.
When using asymmetric seeds (2*hash_s1 - hash_s2), the below read aligns to position 29094803 on chrY when aligned as the only read, but to position 29091249 on chrY when alignd as part of a file of 100k reads (using -t 2). In both cases it has CIGAR 114=1X161=1X3=1X42=1X8=1X64=1X30=1X24=1X16=1X8=1X3=1X1=1X14= and alignment score 900. The full simulated file is too large to attach here, I can provide it elsewhere if needed.
Btw, for symmetric seeds (as is currently used) the read aligns with alignment score 1000 and 223=1X38=1X237= to position 44832808 on chr Y.
The text was updated successfully, but these errors were encountered:
ksahlin
changed the title
Shuffle pseudo-random on query name instead?
Shuffle identical alignments pseudo-randomly on query name instead?
Mar 30, 2024
Hi @marcelm (CC @Itolstoganov)
Currently we shuffle on chunk_ID, which makes read mappings different for different number of threads, or if reads are occurring in different chunk.
IIRC, BWA-MEM gets the pseudo random placement from the read name. Is it possible to do this instead of on chunk ID, without noticeable computational overhead? I don't think it's worth implementing if the code becomes complex or if it increases runtime.
I noticed this when running an experiment b/t symmetric and asymmetric seeds with reads simulated from either chr X or Y and mapping to only chr X and chr Y from CHM13.
When using asymmetric seeds (
2*hash_s1 - hash_s2
), the below read aligns to position29094803
on chrY when aligned as the only read, but to position29091249
on chrY when alignd as part of a file of 100k reads (using-t 2
). In both cases it has CIGAR114=1X161=1X3=1X42=1X8=1X64=1X30=1X24=1X16=1X8=1X3=1X1=1X14=
and alignment score 900. The full simulated file is too large to attach here, I can provide it elsewhere if needed.Btw, for symmetric seeds (as is currently used) the read aligns with alignment score 1000 and
223=1X38=1X237=
to position44832808
on chr Y.The text was updated successfully, but these errors were encountered: