Bayesian SINDy #440

mikkelbue · 2023-12-13T12:05:19Z

Hi all!

This is a (possible) implementation of Bayesian SINDy. It uses numpyro under the hood to drive NUTS sampling.

It is implemented as a pysindy.optimizers module, SBR for Sparse Bayesian Regression.
Uses the regularised horseshoe prior outlined in Hirsh, et al..
Parameters to the SBR optimizer include hyperparameters to the statistical model, i.e. tau_0, nu and s for the hyperprior and lamb for the exponential prior over the measurement noise.
Additional kwargs are fed forward to the numpyro.infer.MCMC object.
self.coef_ is set to the mean of the posterior samples.
Additional MCMC diagnostics can be invoked by inspecting the SINDy.optimizer.mcmc object using arviz.
New notebook 16_bayesian_sindy.ipynb demonstrates the method.

This PR is a work in progress in that test and docs are missing. But I thought it would be better to put it in early in the process, in case anyone has questions, ideas or comments with regards to the implementation. Please let me know if this is a bad idea.

Thanks for considering this PR.

TODO:

Documentation
Tests
Improve / polish notebook

Jacob-Stevens-Haas · 2023-12-14T11:01:01Z

Thank you so much for this - I've been interested in getting Hirsch et al's spike slab priors into pysindy for some time! And selfishly, I've wanted a reason to start interacting with jax. We implemented a bit of kernel smoothing from a paper of @jfelipeosorio in derivative, but we shied away from anything that required jax.

Giving this a quick look now (but not yet a full review), here's some more engineering, less mathematical considerations:
Import

Add an [project.optional-dependencies] section for the dependencies
You noticed the import guard that we did in SR3 for the optional cvxpy dependencies, including the PEP 690 comment. I just checked and it appears PEP 690 was rejected, so you can remove the comment. Someone (maybe me?) needs to do some reading on what the right way to guard import statements.

Example

There already is an example 16 at this point
See the guidance on creating jupyter notebook examples. TL;DR: Maintaining compatibility with existing notebooks is tough. Obviously unit tests make this easier, but if you want to ensure the notebook itself stays runnable, we need to test it with this code in order for CI to care about it. You'd also need to generate the ipynb with examples/publish_notebook.py. (This part is pretty easy - save the notebook as a .py file, then run python publish_notebook.py XX_Bayesian_SINDy)
I like the example a lot - it shows "how to use this" rather than "here's all the different ways you can use this on a variety of systems". We have some people in the lab working on a "Bayesian" SINDy that would definitely appreciate this.
Include some LaTeX for the equations
I assume the rows of the final plot are terms in the discovered equation - could you label them?

Code

Can you add type annotations and docstrings?
What is the difference between SBR and SparseBayesianRegression? I can tell the latter is used in the former, but the former being just the abbreviation of the latter doesn't give any indication of the role that these objects play. Maybe rename SBR -> SBROptimizer, but you're probably in the best place to suggest names.
Best practice is not necessarily settled, but my stance on code variables that shadow math variables: meaning_letter. e.g. if someone's doing L-1 regression, instead of naming a variable lambda, it would be reg_coef_lambda. It helps people who have read the paper but not recently (me in this example) to still be able to troubleshoot errors.

Ok I lied, some math

It's been a while since I read Hirsch, but is this mathematically compatible with EnsembleOptimizer, which subsamples the data and/or library, wraps any BaseOptimizer, and takes the mean/median of resulting models? My feeling is no, which indicates that out typing system needs to expand so as to prohibit this case.

mikkelbue · 2023-12-14T13:03:13Z

Hi @Jacob-Stevens-Haas

Thanks for your comments and suggestions. I'm happy if this is useful!

I have made a few changes:

Added numpyro as an optional dependency.
Removed reference to PEP 690.
Renamed tau_0 -> sparsity_coef_tau0, nu -> slab_shape_nu, s -> slab_shape_s, lamb -> noise_hyper_lambda.
Renumbered the notebook 16 -> 19
Moved the methods from SparseBayesianRegression into SBR. I set it up like that because I thought it was neat to have a general-purpose sparse Bayesian regressor, but it's simpler to just make everything part of the optimizer. I'm not precious about naming, just thought SBR would fit into the existing convention.

I'll have a look at your other points when I have more time.

mikkelbue · 2023-12-14T13:51:28Z

Also, just as a point of order:

This implementation uses the regularised horseshoe prior, not (yet) the spike and slab prior. The regularised horseshoe is computationally more convenient because it is differentiable, so you can just plug the whole shebang into a NUTS sampler.

The spike and slab prior AFAIK requires a compound step sampler or some other clever way of making categorical moves. It's something I can look into, but I am not currently sure what is the best way to achieve this in numpyro.

Jacob-Stevens-Haas · 2023-12-14T14:48:06Z

Oh, sorry, yes I remember now. I meant to say horseshoe prior. I didn't mean to point you in a different direction.

mikkelbue · 2023-12-21T11:10:42Z

@Jacob-Stevens-Haas quick update. 2 new commits:

Optimized the sampling of coefficients by vectorizing (seems obvious now, haha)
Added docstring with type hints.

Still looking to update the notebook with the stuff you requested.

mikkelbue · 2024-01-08T17:23:49Z

@Jacob-Stevens-Haas Okay, I think this addresses all of your points. Please let me know if anything can be improved.

I'm not sure what to add in terms of tests. The notebook has a testing mode now so that should cover some of it.

Jacob-Stevens-Haas · 2024-01-08T21:15:56Z

Thank you! I'll go ahead and review it later this week. Before then, I'll figure out how to get it so your CI runs don't need manual approval. Sorry to keep you waiting.

For tests,

look at test_optimizers for tests that are parametrized to all optimizers
Think about which ValueErrors (bad arguments) you want to check for.
Is there a simple very-low-dimensional (like a length 2, 3, or 4 vector) regression where it's easy to see how the math for horseshoe priors should work out? The three tests starting here are good examples.
Are there any helper functions that you use that have a simple test? Here's one where we pulled out an indexing subroutine, and therefore could test it independently.

Jacob-Stevens-Haas · 2024-01-09T15:17:41Z

Ok I added you as a contributor with "write" permission to the repo. You may need to click an invite link in an email, but that should enable CI to run automatically on your pushes.

mikkelbue · 2024-01-09T16:10:07Z

@Jacob-Stevens-Haas Thank you!

The tests are failing when building the docs and I can't seem to figure out why. This is the trace:

sphinx.errors.SphinxWarning: autodoc: failed to import module 'sbr' from module 'pysindy.optimizers'; the following exception was raised:
No module named 'jax'

Warning, treated as error:
autodoc: failed to import module 'sbr' from module 'pysindy.optimizers'; the following exception was raised:
No module named 'jax'

I have already included jax as an optional dependency in the numpyro group, so I don't understand how this is different from the way cvxpy is set up for SINDyPI. But I must have missed something.

Jacob-Stevens-Haas · 2024-01-09T18:48:52Z

The github workflow file creates a new environment and installs the current package, including all specified optional dependencies. You'd need to change this line to add your optional dependencies to the list.

codecov · 2024-01-11T15:59:15Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (9c73768) 94.40% compared to head (2060875) 94.40%.
Report is 1 commits behind head on master.

Files	Patch %	Lines
pysindy/__init__.py	50.00%	2 Missing ⚠️
pysindy/optimizers/__init__.py	50.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #440      +/-   ##
==========================================
- Coverage   94.40%   94.40%   -0.01%     
==========================================
  Files          37       38       +1     
  Lines        3989     4060      +71     
==========================================
+ Hits         3766     3833      +67     
- Misses        223      227       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mikkelbue · 2024-01-12T13:51:33Z

Ok I added you as a contributor with "write" permission to the repo. You may need to click an invite link in an email, but that should enable CI to run automatically on your pushes.

Ah, I just found the invitation. It was on the front page of this repo 🤔 .. thank you!

Jacob-Stevens-Haas · 2024-01-16T11:08:33Z

Hey sorry for slow responses on my end - I'm responsible for #451 and #459, which are both pretty complex. I'll finish #451 in a few days and then I'll give this a thorough review.

mikkelbue · 2024-02-06T11:33:34Z

test_pickle

This is now added by refactoring the existing test_pickle using @pytest.mark.parametrize.

Something that directly tests horsehoe prior (_sample_reg_horseshoe) in a very simple case

This is slightly problematic, since if called outside a model context, the numpyro.sample functions would need to take an additional argument key to produce a sample. I can adapt the function to optionally take this, but that feels a little roundabout just to get a unit test set up. What do you think?

Something that tests overall SBR in a very simple case

This is now done as test_sbr_fit. It just asserts whether SBR has an mcmc_ attribute after fitting and whether it contains samples of the correct variables.

Jacob-Stevens-Haas · 2024-02-08T15:38:39Z

I can adapt the function to optionally take this, but that feels a little roundabout just to get a unit test set up. What do you think?

No, that makes sense. I always wonder about what the right way to test things things is (and thereby, understand), but having read more about reg horseshe, I agree the roundabout work is not necessary here. I read a bit more about the reg-horseshoe, so I'll add some more info in the docstrings back in in a suggested edit.

This is now done as test_sbr_fit. It just asserts whether SBR has an mcmc_ attribute after fitting and whether it contains samples of the correct variables.

I meant even simpler setup, but more rigorous test, a la test_sparse_subset. Try this one below:

def test_sbr_fit():
    x = np.eye(2)
    y = np.array([[1], [1e-4]])
    expected = np.array([[1], [0]])
    result = SBR(num_warmup=10, num_samples=10).fit(x, y).coef_
    np.testing.assert_allclose(result, expected, atol=1e-1)

This is a good example of what it takes to get SBR to succeed in the simplest of cases, and I'm looking into why it's failing.

Jacob-Stevens-Haas

All suggested docstring changes based on looking into parameters to get the test case to fit.

pysindy/optimizers/sbr.py

Jacob-Stevens-Haas · 2024-02-08T16:59:12Z

~~Ok, I've got it but I have to run, will post it a bit later. x and y in my code need more samples - even dropping the noise prior near zero doesn't do it.~~
There were two problems with my test, and I had only solved one. The first problem was recovering a reasonable answer to a simple regression problem, and that required more samples. The second problem was trying to tune the parameters to demonstrate some shrinkage. But I can't figure out a way to tune this regression to eliminate a parameter (that's why I spent a lot of time on understanding the init parameters this past week). The best I can get is fitting coefficients with no shrinkage:

test_sbr_accurate()
    # It's really hard to tune SBR to get desired shrinkage
    # This just tests that SBR fits "close" to unregularized regression
    x = np.tile(np.eye(2), 4).reshape((-1, 2))
    y = np.tile([[1], [1e-1]], 4).reshape((-1, 1))
    opt = SBR(num_warmup=50,num_samples=50).fit(x, y)
    result = opt.coef_
    unregularized = np.array([[1, 1e-1]])
    np.testing.assert_allclose(result, unregularized, atol=1e-3)
    assert hasattr(opt, "mcmc_")

That's probably good enough for now. In an ideal world, we could have a test that kind of recreated canonical "horeshoe" in Figure 2 of Handling Sparsity via the Horseshoe or Figure 1 of Sparsity Information Regularization in the Horseshoe and other Shrinkage Priors.

Jacob-Stevens-Haas · 2024-02-11T18:24:19Z

So that's all for testing. For the notebooks,

Combine them (just one notebook file) as example.ipynb
Save as a python file
cd examples, then python publish_notebook.py 19_bayesian_sindy.
Run pytest --durations=10 --external_notebook 19_bayesian_sindy to see if it tests
(or you can run pytest --durations=10 -m "notebook" -k test_notebook and see if it gets picked up)
set constants that determine how long the test takes with if __name__ == "testing": <short value> else: <full value> in order to speed up the test. num_samples and num_warmup are good candidates, as are anything to control the length of the simulation.

Co-authored-by: Jacob Stevens-Haas <[email protected]>

mikkelbue · 2024-02-12T09:12:15Z

@Jacob-Stevens-Haas Thank you for keeping up with this.

The first problem was recovering a reasonable answer to a simple regression problem, and that required more samples.

This is why I avoided comparing to "theoretical" values for the test I originally wrote. MCMC will always require some samples to get a sensible answer, and I suspect that this test with 50 warmup and 50 samples is not yet a converged MCMC. But if this works, that is great.

But I can't figure out a way to tune this regression to eliminate a parameter (that's why I spent a lot of time on understanding the init parameters this past week).

Shrinkage does work as expected when running more elaborate examples, but parameters will hardly ever be exactly 0, since the ._coef are the means of the MCMC samples. We could set a clipping threshold as in e.g. STLSQ, if we wanted to zap the small coefficient completely.

I will have a look at converting the example to a script.

…ple notebook

mikkelbue · 2024-02-13T11:01:30Z

@Jacob-Stevens-Haas

So that's all for testing. For the notebooks, ...

All done. However pytest --durations=10 --external_notebook 19_bayesian_sindy fails with pytest: error: unrecognized arguments: --external_notebook and I can't find any documentation for that argument.

Jacob-Stevens-Haas · 2024-02-20T12:27:41Z

LGTM, and thank you for your dedication on this! I'll look into that pytest issue separately - pytest allows you create command line arguments in conftest.py, which is what I did for --external-notebook. It existed before the split to example.py/example.ipynb, so probably not necessary anymore.

Looks like example 19 runs in 8.5 seconds under test which is great :)

Add Bayesian SINDy

mikkelbue added 4 commits December 13, 2023 11:18

Added Sparse Bayesian Regression (SBR) optimizer

79a4fa0

Added Bayesian SINDy notebook

5c5ee4e

Blacked Sparse Bayesian Regression code

acab720

Renamed BayesianSparseRegression to SparseBayesianRegression

66112bb

mikkelbue added 5 commits December 14, 2023 12:37

Added numpyro as an optional dependency

cc89101

Removed reference to PEP 690 from SBR import

22fac95

Renamed SBR hyperparameters to something more interpretable

a681ae8

Renamed Bayesian SINDy notebook

8da636a

Moved methods from SparseBayesianRegression into SBR optimizer

f209b54

mikkelbue added 2 commits December 21, 2023 10:43

Vectorized sampling of SINDy coefficients in numpyro for optimization

5b60afe

Added docstring to Sparse Bayesian Regression optimizer

b827dab

Added some documentation and testing mode to the Bayesian SINDy notebook

f56e9e1

Added jax as optional dependency for Bayesian SINDy

5ab7932

mikkelbue added 4 commits January 10, 2024 09:20

Added numpyro dependecy group to github workflow

83c6a17

Removed misplaced space in github workflow

8c1a923

Fixed bug in SBR docstring.

66c0f15

Removed reference in docstring.

754722a

mikkelbue added 4 commits February 6, 2024 11:10

Added simple test of SBR fitting

8c0c019

Black test_optimizers.py

4fb02a0

Added pickle test for SBR optimizer

9ef48d7

Updated test_sbr_fit so Flake8 isn't tripped

d8b93b1

Jacob-Stevens-Haas self-requested a review February 8, 2024 16:26

Jacob-Stevens-Haas requested changes Feb 8, 2024

View reviewed changes

pysindy/optimizers/sbr.py Outdated Show resolved Hide resolved

pysindy/optimizers/sbr.py Outdated Show resolved Hide resolved

pysindy/optimizers/sbr.py Outdated Show resolved Hide resolved

pysindy/optimizers/sbr.py Outdated Show resolved Hide resolved

mikkelbue and others added 5 commits February 12, 2024 08:56

Rephrase docstring of SBR for clarity.

eced555

Co-authored-by: Jacob Stevens-Haas <[email protected]>

Rephrase description of "slab_shape_nu" in SBR for clarity

ad38749

Co-authored-by: Jacob Stevens-Haas <[email protected]>

Rephrase desciption of "slab_shape_s" in SBR for clarity

cd4ecee

Co-authored-by: Jacob Stevens-Haas <[email protected]>

Add docstring and type hints to "_sample_reg_horseshoe"

54422da

Co-authored-by: Jacob Stevens-Haas <[email protected]>

Remove superfluous function definition from "_sample_reg_horseshoe"

ed358cb

mikkelbue added 7 commits February 12, 2024 09:14

Fixed bug in type hints for "_sample_reg_horseshoe"

b2579bb

Changed SBR test to also test for coefficient values

a13ce82

Removed discrete example from Bayesian SINDy

6426f6c

Added comment about the limitations of the current SBR method to exam…

f20fd1e

…ple notebook

Added SBR example script

3c37bf0

Ran publish_notebook.py on SBR example

6bf64e3

Tidied up and reran SBR notebook example after publishing it.

9aeadea

mikkelbue added 2 commits February 13, 2024 15:44

Added arviz to SBR dependencies

b855006

Added high-level import of SBR

2060875

Jacob-Stevens-Haas approved these changes Feb 20, 2024

View reviewed changes

Jacob-Stevens-Haas merged commit a0e36dd into dynamicslab:master Feb 20, 2024
4 of 6 checks passed

jpcurbelo pushed a commit to jpcurbelo/pysindy_fork that referenced this pull request Apr 30, 2024

Merge pull request dynamicslab#440 from mikkelbue/bayesian_sindy

05e75d4

Add Bayesian SINDy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bayesian SINDy #440

Bayesian SINDy #440

mikkelbue commented Dec 13, 2023 •

edited

Loading

Jacob-Stevens-Haas commented Dec 14, 2023 •

edited

Loading

mikkelbue commented Dec 14, 2023

mikkelbue commented Dec 14, 2023

Jacob-Stevens-Haas commented Dec 14, 2023

mikkelbue commented Dec 21, 2023

mikkelbue commented Jan 8, 2024

Jacob-Stevens-Haas commented Jan 8, 2024

Jacob-Stevens-Haas commented Jan 9, 2024

mikkelbue commented Jan 9, 2024

Jacob-Stevens-Haas commented Jan 9, 2024

codecov bot commented Jan 11, 2024 •

edited

Loading

mikkelbue commented Jan 12, 2024

Jacob-Stevens-Haas commented Jan 16, 2024

mikkelbue commented Feb 6, 2024

Jacob-Stevens-Haas commented Feb 8, 2024

Jacob-Stevens-Haas left a comment

Jacob-Stevens-Haas commented Feb 8, 2024 •

edited

Loading

Jacob-Stevens-Haas commented Feb 11, 2024 •

edited

Loading

mikkelbue commented Feb 12, 2024

mikkelbue commented Feb 13, 2024

Jacob-Stevens-Haas commented Feb 20, 2024 •

edited

Loading

Bayesian SINDy #440

Bayesian SINDy #440

Conversation

mikkelbue commented Dec 13, 2023 • edited Loading

Jacob-Stevens-Haas commented Dec 14, 2023 • edited Loading

mikkelbue commented Dec 14, 2023

mikkelbue commented Dec 14, 2023

Jacob-Stevens-Haas commented Dec 14, 2023

mikkelbue commented Dec 21, 2023

mikkelbue commented Jan 8, 2024

Jacob-Stevens-Haas commented Jan 8, 2024

Jacob-Stevens-Haas commented Jan 9, 2024

mikkelbue commented Jan 9, 2024

Jacob-Stevens-Haas commented Jan 9, 2024

codecov bot commented Jan 11, 2024 • edited Loading

Codecov Report

mikkelbue commented Jan 12, 2024

Jacob-Stevens-Haas commented Jan 16, 2024

mikkelbue commented Feb 6, 2024

Jacob-Stevens-Haas commented Feb 8, 2024

Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

Jacob-Stevens-Haas commented Feb 8, 2024 • edited Loading

Jacob-Stevens-Haas commented Feb 11, 2024 • edited Loading

mikkelbue commented Feb 12, 2024

mikkelbue commented Feb 13, 2024

Jacob-Stevens-Haas commented Feb 20, 2024 • edited Loading

mikkelbue commented Dec 13, 2023 •

edited

Loading

Jacob-Stevens-Haas commented Dec 14, 2023 •

edited

Loading

codecov bot commented Jan 11, 2024 •

edited

Loading

Jacob-Stevens-Haas commented Feb 8, 2024 •

edited

Loading

Jacob-Stevens-Haas commented Feb 11, 2024 •

edited

Loading

Jacob-Stevens-Haas commented Feb 20, 2024 •

edited

Loading