-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statistical testing with keep_unary #2134
Comments
Should also do this with some stats on the mutations so that we know sim_mutations is doing the right thing on these topologies (see also #2137) |
My expectation is that the recording of unary nodes should not affect the pseudorandom generation, so we should be able to see that the statistics are exactly equal before and after recording. Is that what you had in mind? |
I was looking at the branch-based diversity statistic between simulations with and without unary nodes in the Hudson model, and found that they are sometimes but not always equal. After investigating more it seems that most tree sequences have the same structure with and without unary nodes but some are very different. |
So I guess the question is whether we expect the recording of unary nodes to change the simulated ts under a given seed. If it is purely a recording process, it should probably not affect the trees. On the other hand, it is possible that the recording of unary nodes can change something (e.g., the ordering of lineages) which ultimately can affect the tree sequence (but not its distribution). It would be nice (but not critical) for the node recording to not affect the simulation outcome. |
Thanks @General-Solution for looking into this! I agree @sgravel, ideally, recording nodes should not affect the simulation outcome. This is also what I would have expected the behaviour to be. |
Sorry about the delay in getting back, was at a conference and had an admin mountain to climb...
I don't think we can make any guarantees about equivalence under a given seed. I agree it would be nice if we could, but even the tiniest change in the state can alter the simulation trajectory and past experience has taught me that it's extremely difficult keep the internal states identical when doing slightly different things. I don't see immediately why the simulations would diverge here, but it would have been surprising to me had the outputs turned out to be identical in all cases (not the other way around!). |
Regarding statistical tests, I think we can probably just redo a few of the existing tests (with subclassing?) with this |
Regarding the seed: Ok with me, it's probably not worth refactoring the code to maintain the seed behaviour. Just wanted to make sure. |
By mutation based tests, is that in the sense of counting diversity as a site statistic instead of a branch statistic? Because that seems to also be changed by including store-unary. |
@General-Solution |
Yes, exactly. |
Solved for |
Solved for |
We should run some statistical tests on the trees output by keep_unary, just to make sure that nothing weird has happened. It's probably sufficient to run some simple downstream stats that shouldn't be affected by these extra nodes.
The text was updated successfully, but these errors were encountered: