Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trimLRPatterns with indels not trimming as intended in case of net insertions or deletions between subject and pattern? #88

Open
jan-glx opened this issue Nov 18, 2022 · 2 comments

Comments

@jan-glx
Copy link

jan-glx commented Nov 18, 2022

In the following examples I'd expect trimLRPatterns to always leave ATCG:

library(Biostrings)
subject <- DNAStringSet(c("AAATCTCTCATCG"))
trimLRPatterns(subject = subject, Lpattern = "AAATCTCTC", Rpattern = "" , with.Lindels = TRUE, max.Lmismatch = 0.3 )
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     4 ATCG
trimLRPatterns(subject = subject, Lpattern = "AAAATCTCTC", Rpattern = "" , with.Lindels = TRUE, max.Lmismatch = 0.3 )
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     3 TCG
trimLRPatterns(subject = subject, Lpattern = "AATCTCTC", Rpattern = "" , with.Lindels = TRUE, max.Lmismatch = 0.3 )
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     5 CATCG

Created on 2022-11-18 by the reprex package (v2.0.1)

@acvill
Copy link
Contributor

acvill commented Jun 9, 2023

I think the issue is that Biostrings does not consider terminal gaps in alignments to be indels. See Bioconductor/pwalign#2.

ATCG is retained if you specify with.Lindels = FALSE.

subject <- DNAStringSet(c("AAATCTCTCATCG"))
trimLRPatterns(subject = subject, Lpattern = "AAATCTCTC", Rpattern = "" , with.Lindels = FALSE, max.Lmismatch = 0.3)
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     4 ATCG
trimLRPatterns(subject = subject, Lpattern = "AAAATCTCTC", Rpattern = "" , with.Lindels = FALSE, max.Lmismatch = 0.3)
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     4 ATCG
trimLRPatterns(subject = subject, Lpattern = "AATCTCTC", Rpattern = "" , with.Lindels = FALSE, max.Lmismatch = 0.3)
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     6 TCATCG

@jan-glx
Copy link
Author

jan-glx commented Jun 14, 2023

Possibly. But while that issue might be considered odd behavior, this here is a bug, right?
Though by I am even more confused by your last with.Lindels = FALSE example right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants