Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid paths in a GFA don't have end-to-end alignments #103

Open
rlorigro opened this issue Sep 13, 2023 · 1 comment
Open

Valid paths in a GFA don't have end-to-end alignments #103

rlorigro opened this issue Sep 13, 2023 · 1 comment

Comments

@rlorigro
Copy link

rlorigro commented Sep 13, 2023

Hi,

I am cross-posting this issue which I have submitted to the PanAligner repository, since it has been found by the authors to be a result of the base-level alignment step taken from minigraph/gwfa.

In summary, if I take a simple 3 node graph in which the middle node could be viewed as an insertion, and I concatenate the three nodes as a query, minigraph fails to find an end-to-end alignment. The shortest node is 38bp.

Thanks

PS: to clarify I've run the exact same test with minigraph and found the same results

@rlorigro
Copy link
Author

Just reposting the details here for better visibility. This seems like a pretty serious bug to me.

Here is the GFA:

S	1	tcatggtacttgtcaccttctgtcctatttgagtcatgatgcatttgtctatcatctatctccctgcaatgaacatccctggggtggggccagggtctgttttgcttggtgctgcattcccattgccaagaagacctcacaggcatatcaagcacacagcaggtgcataacacttgctgagtttgctgaatgaatCCCTCCTCTGTATCCTACACATCCAACTGGTCACCACGTCCTGTGCGGTAGATGTCCTTTTTAGCAGGTCCCCTGCACTTCACTGTGTTTGTTAAGCCTTGGCTTGCTCTGtcatttcattttgcttaggTGGCGGCAGCAGCCCCTGACAGCCCCTGGCCTCACAGCCCCTGGCCTCAACCCATATCCTCTCCACTCTGCACACAGCTGCTGGGAGACTTTTCCTTAAAACTtggtttctctcctttcccttctgaAAACTGTCAGTGGCTTCCTGTGGCTCAGAGGAATCAATCGAAACATGGTGCTGCTCACAAGGAAACACAACGTTTCTAGGGCCAGTGCTGTCTTTGTCCTGCCTTCCTCCAAATTCAGCCACGTTAAGCCTCAAGCAGATCCTCCCACACAGCTGGGCTTCTTCTGCCTTTCTGCCAGGACTTGCTGCTCCCAGGAGCCTCCTCCCCCAGTCCCTGCAGCCCCAGGTCAGGTGTCTCCCCACCCAGCCTGGCTCAGCTGTCCTTTCCCGTGCAACTACAGGTTCATACAATTTGCTATTGCTCCATCTACTCTATGTGACTCTTGGTTTCTTGAAGGCAGGTATGggactctttctttccttctttgctttcttttctctttctctttctttctttttctttctttctttctttctttc
S	2	tttctttctttctttctttctttctttctttctttctt
S	3	tctctctctctctctctcttttcttttctttctttctctctttctgtttttagagactgggtcttgctgtcacccaggttggagtgcagtggtgcaatcatggctcaccgaactcctgggctcaagtgagcctcttgcctcagcctcccaattagttgggactacaggcatgtgccactacacctggctaattgttattattattattattattattattatttttgtaaagacagggtcttgctctgtttcctaggctggtcttgaacccctggcctcaaatgatcctcctgcctcagcctcccaaagtgctgggattccaggagtaagataccatgtctggccTGAGAAATTTTCTAAAAGGCATGTTTTTGCACTGtccaatatgtttttctttttatcccagcctctagcacaagtgcctgactcgaacgaagcagggactctgaaatatcttgaatgaacaagtAAATGGCACTTGAATAGGTGGCCTCTAATGtgcagaggaaggaaaagggagctgacgctttccgagtgttcagtaccttctaggcaccatgctgggccttttatgtagttctcgatgaatcctcatggcatcccGCTTTGCAGAATGATTCTAGTTACCCAGAGAGCAAGTAGCCCTTAGGGTCAGGACATGCATCTGAGTTGGTCTAGTCCCAAAGCTCCTGGTCCTCTCCTGACATCActtaaacagagaaacagaactccCCTTGGGCTTCTAGGGGCGCTGGGTTCAGGAGGCACAGCCACTCCCTTTGTTCTTCCTGGCAGCTGCCCCACCAGCAGTGAGCCCATCCCacctctgggttttcttttt
L	1	+	2	+	0M
L	1	+	3	+	0M
L	2	+	3	+	0M

When I concatenate 1,2,3 or 1,3 and align them to the GFA, I never get an end-to-end alignment, and node 2 is always missing:

1       871     1       871     +       >1      871     1       871     870     870     60      tp:A:P  NM:i:0  cm:i:51 s1:i:714        s2:i:0  dv:f:0.0756     cg:Z:870=
3       841     5       836     +       >3      841     5       836     831     831     60      tp:A:P  NM:i:0  cm:i:48 s1:i:672        s2:i:0  dv:f:0.0790     cg:Z:831=
1_2_3   1750    1       857     +       >1      871     1       857     856     856     60      tp:A:P  NM:i:0  cm:i:50 s1:i:714        s2:i:0  dv:f:0.0751     cg:Z:856=
1_2_3   1750    930     1745    +       >3      841     21      836     815     815     60      tp:A:P  NM:i:0  cm:i:47 s1:i:671        s2:i:0  dv:f:0.0795     cg:Z:815=
1_3     1712    1       857     +       >1      871     1       857     856     856     60      tp:A:P  NM:i:0  cm:i:50 s1:i:700        s2:i:0  dv:f:0.0751     cg:Z:856=
1_3     1712    892     1707    +       >3      841     21      836     815     815     60      tp:A:P  NM:i:0  cm:i:47 s1:i:672        s2:i:0  dv:f:0.0795     cg:Z:815=

Here is a visualization showing how neither the graph nor the linear query are fully covered:
image

I have tried multiple different combinations of parameters to attempt to find seeds/minimizers in node 2:

Defaults:

PanAligner -c -x lr -o /tmp/tmpn2oq6iyx/reads_vs_graph.gaf bad_tandem_graph.gfa bad_tandem_nodes.fasta
/tmp/tmpn2oq6iyx/reads_vs_graph.gaf

Permissive:

PanAligner -c -k 14 -n 1,1 -m 1,1 -x lr -o /tmp/tmpyid87tic/reads_vs_graph.gaf bad_tandem_graph.gfa bad_tandem_nodes.fasta
/tmp/tmpyid87tic/reads_vs_graph.gaf

I've also tried increasing the -g and -r parameters to allow the extension step to find more alignments, but that does not seem to have any effect. Alternatively, extreme 0/1 values of -f also have no effect.

Could you help me with this case? I am having difficulty understanding why the base level alignment step doesn't bridge the chains here. Ideally, I would use base level alignment for a majority of the work, and only rely on minimizers for anchoring the flanks of the tandem repeat.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant