-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do simplify() in steps #24
Comments
How much simpler would it be if IDs didn't have to be 0..n - 1? I can probably remove this requirement fairly quickly if you're happy with possible bugs that may pop up. Simplify will probably still map the samples back to 0... n -1 though --- changing this wouldn't be impossible, just a bit more work as we'd have to change the interface a bit. |
It's a small but nontrivial annoyance to always be keeping space at the
start of ID space for the samples. It's easy to work around, so noo big
deal, but also easy to forget - was yhe source of a bug by @ashander just
te other day.
…On Mon, Apr 24, 2017 at 7:53 AM, Jerome Kelleher ***@***.***> wrote:
How much simpler would it be if IDs didn't have to be 0..n - 1? I can
probably remove this requirement fairly quickly if you're happy with
possible bugs that may pop up. Simplify will probably still map the samples
back to 0... n -1 though --- changing this wouldn't be impossible, just a
bit more work as we'd have to change the interface a bit.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#24 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA_26QPS2XuLz31E9xqo47FfjurrkhM-ks5rzLd1gaJpZM4NGQVd>
.
|
OK, I'll have a look this evening to see if it can be done reasonably easily. I'll let you know either way. |
This change could improve performance a lot, hopefully. Results in #25 suggest a most time is spent merging records, which scales with the number of nodes as ___ (TBD) and number of edgesets as (TBD). Also, as @petrelharp and I were discussing today, this change is equivalent to:
we're thinking that we can define a clear way to "add" two tree sequences |
Hm, I'm not sure:
This change would reduce memory usage (which is being a problem), but that's not what's leading to the slowdown we see over there. Or am I missing something? |
😬 yes the results in #25 aren't a clear case for this. (thought it's still probably good to do.) my thought was that as dict size increases the operations on the dict slow down, possibly to a crawl. but need to do more profiling across a range of pop sizes to see this. I'll do that comment more over in #25 |
Accessing dicts should scale as log(n), I would think. Hm, but we are
inserting new things.
…On Wed, Apr 26, 2017 at 9:15 AM, ashander ***@***.***> wrote:
😬 yes the results in #25 <#25>
aren't a clear case for this. (thought it's still probably good to do.) my
thought was that as dict size increases the operations on the dict slow
down, possibly to a crawl. but need to do more profiling across a range of
pop sizes to see this. I'll do that comment more over in #25
<#25>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA_26aYrkim82wni60LDo8NaCyhkYbxKks5rz22dgaJpZM4NGQVd>
.
|
Just butting in here:
Is adding T1 and T2 together the right way to do this? I had imagined that the same tree sequence would be incrementally updated at each step. So, it would be something like:
This seems a bit simpler to me. I don't think there will be much overhead in re-simplifying the older bits of T repeatedly, and it avoids the complexity of figuring out how to add two tree sequences together. |
I agree with Jerome, which is also what I wrote initially. I hadn't caught that Jaime suggested something different. |
Peter's original suggestion and Jerome's elaboration (no worries @jeromekelleher your input is always more than welcome!) is a good way to go for the change suggested in this issue. My comment left some context out and could have been clearer. I should've mentioned that, for other reasons, it might be useful to define a way to "add" two tree sequences. For example if we wanted to run coalescent sims in reverse time, then run many replicate forward sims from that starting point. But implementing "add" is clearly not necessary to do simplify in steps. |
though it turned out to be useful :) |
Currently we record all of history and then
simplify()
it all at once. This ends up taking rather a lot of memory. As discussed with @jeromekelleher, we ought to be able to do this stepwise. To do this, we'll need to make sure there's room enough to sample the entire population, and then:add_samples(ids)
whereids
is the current generation, and remember thatids -> range(len(ids))
simplify()
argrecorder
and remember thatrange(len(ids)) -> initial_gen
To append more things to a tree sequence we need to:
NodeTable
initial_gen
to whatever they should beThe text was updated successfully, but these errors were encountered: