-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoc #89
base: master
Are you sure you want to change the base?
Autoc #89
Conversation
This wrings another ~8% performance boost from a new way to draw random numbers.
This commit creates functionality for the python API. Specifically, `gibbs` can now be allowed to auto-converge where it will continue sampling until it converges or reaches a fixed number of samples (defined by the user.
hmm, not sure how i got all these other commits in there. this is for initial review and then we can discuss parameters. |
@gregcaporaso Can you review this pull request? |
I can review this over the weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wdwvt1, some comments on this. Overall it looks reasonable.
@@ -379,11 +379,11 @@ def __init__(self, alpha1, alpha2, beta, source_data): | |||
self.joint_probability = np.zeros(self.V, dtype=np.float64) | |||
|
|||
def set_n(self, n): | |||
"""Set the sum of the sink.""" | |||
'''Set the sum of the sink.''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A setter like this isn't necessary, since n
is a public property and can be set directly. I recommend dropping the setter, but not a big deal.
|
||
If `autoc` is not `None`, the meaning of the `draws_per_restart` parameter | ||
changes. If `autoc` is `None`, `draws_per_restart` draws will be taken. If | ||
`True`, this function will check for convergence in the number of counts in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is a bit confusing. It reads like autoc
takes either None
or True
.
I would change this line to say: a function is provided, it will be applied to check...
@@ -478,22 +501,36 @@ def gibbs_sampler(sink, cp, restarts, draws_per_restart, burnin, delay): | |||
ordering (same ordering as `final_env_assignments`). The [i, j] entry | |||
is the environment that the taxon `final_env_assignments[i, j]` is | |||
determined to have come from in draw i (j is the environment). | |||
""" | |||
times : np.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change represents a backward-incompatible interface change, so should probably be noted in the CHANGELOG.md
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're changing the interface anyway, it might make sense for this to just return a NamedTuple
object - that's nicer for handling many return values (it's a bit more self-documenting, but since it's still a tuple code that calls this function shouldn't need to be updated).
@@ -462,6 +479,12 @@ def gibbs_sampler(sink, cp, restarts, draws_per_restart, burnin, delay): | |||
additional samples will be drawn every `delay` number of passes. This | |||
is also known as 'thinning'. Thinning helps reduce the impact of | |||
correlation between adjacent states of the Markov chain. | |||
autoc : function or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does autoc
stand for? Can you not abbreviate it for clarity?
autoc : function or None | ||
Determines what mode the function will be run in. If not `None`, | ||
`autoc` will be used to determine if the chain has converged. `autoc` | ||
must accept two arguments; the `final_envcounts` array and an index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's passed three arguments below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add: autoc
should return a bool indicating if convergence has been achieved (True
) or not (False
).
# update counts for each environment and the unknown source | ||
# if necessary. | ||
new_e_idx = np.random.choice(source_indices, p=jp / jp.sum()) | ||
cs = jp.cumsum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change from jp.sum()
to jp.cumsum()
? (Just want to make sure that that was intentional.)
return (final_envcounts, final_env_assignments, | ||
final_taxon_assignments, times, conv_chain_fraction) | ||
|
||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really shouldn't have different collections of values in the return here, it makes code that uses this much harder to work with as the calling function has to anticipate what it's going to get back. It would be much better to return an empty np.array
in place of times
, and a float that makes sense (0.0?) in place of conv_chain_fraction
. If you don't want to do that, you should split this entire function into two different functions corresponding to the different behavior. For example, gibbs_sampler
and gibbs_sampler_autoc
.
|
||
Returns | ||
------- | ||
int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a bool, not an int.
convs = pd.DataFrame(np.array([i[3] for i in results]), | ||
index=sinks.index, columns=cols) | ||
convs['converging_fraction'] = [i[4] for i in results] | ||
return mpm, mps, fas, convs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about not changing the number of values returned from the function. I highly recommend not going with this approach.
############################################################################### | ||
|
||
|
||
def method5(sources, sinks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more descriptive name that could be used here?
@wdwvt1, it looks like this was built on a branch containing the commits in #83, which is why those commits are in here. This therefore seems to contain all of the changes in that PR as well. It's generally good practice to keep different changes in different PRs - if you want to go with that approach, you could |
you are tottaly right - i am not sure how i managed to do that :/. thanks
for all the comments! i will clean these up and submit a new PR.
…On Sat, Feb 18, 2017 at 12:36 PM, Greg Caporaso ***@***.***> wrote:
@wdwvt1 <https://github.com/wdwvt1>, it looks like this was built on a
branch containing the commits in #83
<#83>, which is why those
commits are in here. This therefore seems to contain all of the changes in
that PR as well. It's generally good practice to keep different changes in
different PRs - if you want to go with that approach, you could git
checkout master, git pull upstream master, and then copy you changes over
to a new branch and submit that in a new PR.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#89 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA__-bzJhZKcfRHLJdY3P4E0Il8zuexMks5rd1Y6gaJpZM4L9Z5f>
.
|
@wdwvt1 - any chance to address @gregcaporaso's comments to implement the autoconvergence? |
No description provided.