Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up model fitting and reduce memory usage #2

Open
jr-leary7 opened this issue Nov 22, 2024 · 7 comments
Open

Speed up model fitting and reduce memory usage #2

jr-leary7 opened this issue Nov 22, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@jr-leary7
Copy link
Owner

Some options:

  • Check whether some model parameters can be omitted via the save_pars argument in brms::brm()
  • Optimize multithreading, currently a single core is used for each chain, but brms supports within-chain parallelism as well. be careful to not allocate more threads than the machine supports
  • look into GPU speedups via OpenCL (not sure if this will work on all machines though)
@jr-leary7 jr-leary7 added the enhancement New feature or request label Nov 22, 2024
@jr-leary7 jr-leary7 self-assigned this Nov 22, 2024
@jr-leary7
Copy link
Owner Author

Implemented OpenCL support via argument opencl.params that is by default NULL so as to prevent errors. Note that I was unable to get OpenCL GPU or CPU acceleration to work on my MacBook Pro 2020 as (I think) Apple's pre-installed OpenCL drivers are quite old and (according to the Internet) rather shitty. For example, the Intel GPU I have does not support double floating point precision, which I believe is required to run cmdstanr models with GPU support.

@jr-leary7
Copy link
Owner Author

Also implemented within chain parallelism in addition to per-chain parallelism with new arguments in findVariableFeaturesBayes(). Checks are made to ensure the total number of requested cores doesn't exceed the available amount.

jr-leary7 added a commit that referenced this issue Nov 23, 2024
@jr-leary7
Copy link
Owner Author

Add Stan compiler arguments via cmdstanr to better optimize code, as it remains unoptimized by default. see below for the correct brms::brm() parameter (source):

stan_model_args = list(stanc_options = list("O1"))

@jr-leary7
Copy link
Owner Author

jr-leary7 commented Nov 26, 2024

Also, maybe try a QR composition on the covariates via argument decomp in function brms::bf(), as well as setting normalize = FALSE in function brms::brm()

source 1 and source 2 for both ideas

@jr-leary7
Copy link
Owner Author

Update - the QR decomposition option only applies to the fixed effects design matrix, and since our design matrix only includes a global intercept and then random effects it won't work.

Another idea is to not model the correlation between random effects; I'm unsure how this will affect performance but it's worth a shot.

@jr-leary7
Copy link
Owner Author

Final update - setting normalize = FALSE and removing the correlation between random effects (along with the previously-mentioned changes above) appears to have sped things up significantly; runtime went from ~35min to ~20min on my 2020 Intel MacBook Pro. Runtime on an HPC cluster will be a lot faster, especially when the number of within-chain threads is increased from the current default of 2. These changes make the method legitimately tractable and usable, which it was not prior.

@jr-leary7
Copy link
Owner Author

Confirming that OpenCL GPU acceleration on my M2 Mac Mini also doesn't work, again likely due to no support for double-precision floating points (obtained using clinfo), see here for details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant