Speed up model fitting and reduce memory usage #2

jr-leary7 · 2024-11-22T16:59:23Z

Some options:

Check whether some model parameters can be omitted via the save_pars argument in brms::brm()
Optimize multithreading, currently a single core is used for each chain, but brms supports within-chain parallelism as well. be careful to not allocate more threads than the machine supports
look into GPU speedups via OpenCL (not sure if this will work on all machines though)

The text was updated successfully, but these errors were encountered:

jr-leary7 · 2024-11-23T18:04:23Z

Implemented OpenCL support via argument opencl.params that is by default NULL so as to prevent errors. Note that I was unable to get OpenCL GPU or CPU acceleration to work on my MacBook Pro 2020 as (I think) Apple's pre-installed OpenCL drivers are quite old and (according to the Internet) rather shitty. For example, the Intel GPU I have does not support double floating point precision, which I believe is required to run cmdstanr models with GPU support.

jr-leary7 · 2024-11-23T18:05:13Z

Also implemented within chain parallelism in addition to per-chain parallelism with new arguments in findVariableFeaturesBayes(). Checks are made to ensure the total number of requested cores doesn't exceed the available amount.

jr-leary7 · 2024-11-26T19:54:54Z

Add Stan compiler arguments via cmdstanr to better optimize code, as it remains unoptimized by default. see below for the correct brms::brm() parameter (source):

stan_model_args = list(stanc_options = list("O1"))

jr-leary7 · 2024-11-26T20:58:21Z

Also, maybe try a QR composition on the covariates via argument decomp in function brms::bf(), as well as setting normalize = FALSE in function brms::brm()

source 1 and source 2 for both ideas

jr-leary7 · 2024-11-26T22:14:41Z

Update - the QR decomposition option only applies to the fixed effects design matrix, and since our design matrix only includes a global intercept and then random effects it won't work.

Another idea is to not model the correlation between random effects; I'm unsure how this will affect performance but it's worth a shot.

jr-leary7 · 2024-11-27T03:12:11Z

Final update - setting normalize = FALSE and removing the correlation between random effects (along with the previously-mentioned changes above) appears to have sped things up significantly; runtime went from ~35min to ~20min on my 2020 Intel MacBook Pro. Runtime on an HPC cluster will be a lot faster, especially when the number of within-chain threads is increased from the current default of 2. These changes make the method legitimately tractable and usable, which it was not prior.

jr-leary7 · 2024-12-03T20:10:54Z

Confirming that OpenCL GPU acceleration on my M2 Mac Mini also doesn't work, again likely due to no support for double-precision floating points (obtained using clinfo), see here for details

jr-leary7 added the enhancement New feature or request label Nov 22, 2024

jr-leary7 self-assigned this Nov 22, 2024

jr-leary7 added a commit that referenced this issue Nov 23, 2024

added OpenCL support - related to #2

c9a129f

jr-leary7 added a commit that referenced this issue Nov 27, 2024

several speedups and fixes -- related to #2

0bc8b56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up model fitting and reduce memory usage #2

Speed up model fitting and reduce memory usage #2

jr-leary7 commented Nov 22, 2024

jr-leary7 commented Nov 23, 2024

jr-leary7 commented Nov 23, 2024

jr-leary7 commented Nov 26, 2024

jr-leary7 commented Nov 26, 2024 •

edited

Loading

jr-leary7 commented Nov 26, 2024

jr-leary7 commented Nov 27, 2024

jr-leary7 commented Dec 3, 2024

Speed up model fitting and reduce memory usage #2

Speed up model fitting and reduce memory usage #2

Comments

jr-leary7 commented Nov 22, 2024

jr-leary7 commented Nov 23, 2024

jr-leary7 commented Nov 23, 2024

jr-leary7 commented Nov 26, 2024

jr-leary7 commented Nov 26, 2024 • edited Loading

jr-leary7 commented Nov 26, 2024

jr-leary7 commented Nov 27, 2024

jr-leary7 commented Dec 3, 2024

jr-leary7 commented Nov 26, 2024 •

edited

Loading