Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/issue 2814 warmup auto #2815

Open
wants to merge 29 commits into
base: develop
Choose a base branch
from

Conversation

bbbales2
Copy link
Member

@bbbales2 bbbales2 commented Sep 11, 2019

Submission Checklist

  • Run unit tests: ./runTests.py src/test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary

The goals/reasons are outlined here: #2814

There'll be a CmdStan pull to go with this (and this probably shouldn't go in until that pull is good too).

This adds another metric that the samplers use to compute all their gradients and such: src/stan/mcmc/hmc/hamiltonians/auto_e_metric.hpp, and src/stan/mcmc/hmc/hamiltonians/auto_e_point.hpp

And then an adaptation routine to actually compute that metric: src/stan/mcmc/auto_adaptation.hpp

Edit: To review this pull request you'll want to pull this version of cmdstan and at least try out the adaptation on a couple models: stan-dev/cmdstan#729

Intended Effect

The adaptation routine updates the metric and tells it whether to act like a dense or diagonal metric at the end of each warmup stage where the metric is recomputed.

How to Verify

The new tests can be run with:
./runTests.py src/test/unit/mcmc/auto_adaptation_learn_covariance_pick_dense_test
./runTests.py src/test/unit/mcmc/auto_adaptation_learn_covariance_pick_diag_test

and

./runTests.py src/test/unit/mcmc/auto_adaptation_test

Side Effects

Hopefully none

Documentation

Yet to be written

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@betanalpha
Copy link
Contributor

Before consideration this is going to require a good bit of empirical validation more than what's in the arXiv paper, especially with regard to varying dimensions and curvatures. To be open I am hesitant about the robustness of automatically switching between diagonal and dense given how small the early windows are, and how noisy those off-diagonal estimates are (not to mention the eigenvalue approximations). For something like this goes to go in it will have to be verified to work properly for diagonally-dominant problems, dense-dominated problems, and everything in between without a significant increase in cost.

Keep in mind that we're trying to minimize the sampler variants in the code base and not have "experimental" versions on dev/master. At some point we'll clean up the other samplers in there.

@bbbales2
Copy link
Member Author

Before consideration this is going to require a good bit of empirical validation more than what's in the arXiv paper, especially with regard to varying dimensions and curvatures. To be open I am hesitant about the robustness of automatically switching between diagonal and dense given how small the early windows are, and how noisy those off-diagonal estimates are (not to mention the eigenvalue approximations). For something like this goes to go in it will have to be verified to work properly for diagonally-dominant problems, dense-dominated problems, and everything in between without a significant increase in cost.

Yup, hopefully we can find some models that break and learn stuff!

@betanalpha
Copy link
Contributor

In general I recommend working through the validation before creating a pull request and using up testing resources. Instead a branch can be discussed on Discourse.

Because of how this proposal modifies warmup it will require studying, at the very least,

  • sensitivity to initial conditions
  • sensitivity to heavy tails
  • sensitivity to dimension
  • warmup time
  • models with spatially-varying covariances

@serban-nicusor-toptal serban-nicusor-toptal added this to the 2.21.0 milestone Oct 18, 2019
@bbbales2 bbbales2 mentioned this pull request Feb 18, 2020
3 tasks
@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.3 3.07 1.07 6.93% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -2.74% slower
eight_schools/eight_schools.stan 0.12 0.12 1.03 3.17% faster
gp_regr/gp_regr.stan 0.18 0.17 1.0 0.48% faster
irt_2pl/irt_2pl.stan 5.68 5.72 0.99 -0.79% slower
performance.compilation 91.0 88.12 1.03 3.17% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.47 8.49 1.0 -0.3% slower
pkpd/one_comp_mm_elim_abs.stan 29.41 29.13 1.01 0.95% faster
sir/sir.stan 131.67 125.59 1.05 4.62% faster
gp_regr/gen_gp_data.stan 0.04 0.04 1.02 2.2% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.96 2.95 1.0 0.32% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.4 0.97 -3.47% slower
arK/arK.stan 1.81 1.79 1.01 0.75% faster
arma/arma.stan 0.62 0.75 0.83 -20.86% slower
garch/garch.stan 0.7 0.55 1.27 21.07% faster
Mean result: 1.01729364403

Jenkins Console Log
Blue Ocean
Commit hash: 473a28c


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.5 3.52 1.0 -0.38% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -3.2% slower
eight_schools/eight_schools.stan 0.12 0.12 1.0 -0.2% slower
gp_regr/gp_regr.stan 0.17 0.17 1.01 1.34% faster
irt_2pl/irt_2pl.stan 5.72 5.68 1.01 0.79% faster
performance.compilation 87.01 85.55 1.02 1.68% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.44 8.56 0.99 -1.44% slower
pkpd/one_comp_mm_elim_abs.stan 30.44 29.2 1.04 4.09% faster
sir/sir.stan 127.13 125.91 1.01 0.96% faster
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 0.09% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.97 2.93 1.01 1.27% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.39 1.0 0.23% faster
arK/arK.stan 2.48 2.48 1.0 -0.13% slower
arma/arma.stan 0.61 0.61 1.0 -0.38% slower
garch/garch.stan 0.74 0.75 0.99 -1.04% slower
Mean result: 1.00270568617

Jenkins Console Log
Blue Ocean
Commit hash: da00166


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants