Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modify checks arg lik #2232

Merged
merged 1 commit into from
Dec 7, 2023
Merged

Conversation

GertjanBisschop
Copy link
Member

Slight modification to the polytomy check when computing likelihoods.
There are a couple of test cases in the current test suite that include examples with multiple roots (within a single local tree). Thus far these did not fail as they were always limited to two roots at most.
These checks could probably be a bit more sophisticated as there are many cases where a decapitated arg could easily be associated with a well-defined likelihood value. @JereKoskela, input welcome.

Copy link

codecov bot commented Dec 7, 2023

Codecov Report

Merging #2232 (cfe1d95) into main (71cdb16) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2232   +/-   ##
=======================================
  Coverage   91.52%   91.52%           
=======================================
  Files          20       20           
  Lines       11334    11337    +3     
  Branches     2302     2304    +2     
=======================================
+ Hits        10373    10376    +3     
  Misses        523      523           
  Partials      438      438           
Flag Coverage Δ
C 91.52% <100.00%> (+<0.01%) ⬆️
python 98.70% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
msprime/likelihood.py 100.00% <100.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 71cdb16...cfe1d95. Read the comment docs.

@JereKoskela
Copy link
Member

JereKoskela commented Dec 7, 2023

It has been two years since I've really looked at this, but I don't think multiple roots should cause any problems in the likelihood function implementation. Was the point of your update in #2107 that the existing test was incorrectly picking up multi-root trees as error cases @GertjanBisschop? If so, then I think the right fix is to just tweak the test to not do that, without catching ARGs with multiple roots with a new error message. Or am I missing something?

@GertjanBisschop
Copy link
Member Author

I was mainly unsure about what happens when we add in nodes by decapitating for example. Then we have a bunch of floating lineages essentially not ending with a common ancestor event. What is the likelihood of such a lineage given that the likelihood computation, from my understanding, checks whether given a coalescence and recombination rate what the next out of those two events is, and what the likelihood is associated with that.

@JereKoskela
Copy link
Member

JereKoskela commented Dec 7, 2023

You're right: floating lineages ending in a degree 1 root would cause problems. Roots with degree 2 caused by common ancestor events are already handled correctly. Is there a good way to detect that distinction?

Edit: It would be pretty easy to add support for the situation where the ARG has just been deterministically stopped before all roots have been reached, causing all remaining extant lineages to end in degree 1 roots at exactly the same time. We'd just use the usual likelihood function until the last recombination or common ancestor node, and then include one more factor for the probability that nothing else happens until the end time. The only thing I'm not sure of is how easy it would be for the function to detect this case.

If there can be floating lineages ending at a variety of times for no apparent reason, then I'm not sure what the interpretation would be and we'd probably want to just return an error.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I think this solves the immediate problem and we could punt the tricky questions about what the algorithm should return in incomplete ARGs to an issue?

@JereKoskela
Copy link
Member

If I'm reading the new test right, then I think there is an issue we need to solve before merging. It looks like this likelihood will return a multiple roots error for any ARG with more than one root. That will include nearly all msprime realisations, even with full_arg = True.

Before merging, we should get this to a stage where multiple common ancestor roots don't throw an error. Else we're rendering the likelihood function effectively unusable.

@GertjanBisschop
Copy link
Member Author

Yep was about to say the same. I agree with @JereKoskela. Modifying the computation to say that we stopped before another event happened is the way to go. I'll create an issue.

@GertjanBisschop
Copy link
Member Author

This will return an error in the presence of any local tree with more than one root. This root does not need to be the same across local trees, so the graph does not need to have a single root. Or am I misunderstanding your comment?

@JereKoskela
Copy link
Member

Ok, that sounds better. I was misunderstanding. In that case let's merge and continue in the issue you just opened.

@mergify mergify bot merged commit 2c5fe55 into tskit-dev:main Dec 7, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants