(WIP?) vectorized log_likelihood function for NumPyro #2390

aporsch1 · 2024-10-03T19:21:12Z

Description

Checklist

Follows official PR format
New features are properly documented
Code style correct (follows pylint and black guidelines)

📚 Documentation preview 📚: https://arviz--2390.org.readthedocs.build/en/2390/

aporsch1 · 2024-10-03T19:30:53Z

Hey, I looked at the checks that failed, and they are failing because they can't even find test cases. I don't think that is related to the updated code at all? Let me know if I am missing something, though.

OriolAbril · 2024-10-07T20:49:41Z

@virajpandya could you try it out and see how timing compares to the ~80 mins from the latest release and setting log_likelihood=False?

You can install the arviz version of this PR with:

pip install "arviz @ git+https://github.com/aporsch1/arviz"

OriolAbril · 2024-10-07T20:53:46Z

Hey, I looked at the checks that failed, and they are failing because they can't even find test cases. I don't think that is related to the updated code at all? Let me know if I am missing something, though.

The pylint checks are failing. These are the specific errors:

************* Module arviz.data.io_numpyro
arviz/data/io_numpyro.py:195:64: C0303: Trailing whitespace (trailing-whitespace)
arviz/data/io_numpyro.py:196:0: C0301: Line too long (105/100) (line-too-long)
arviz/data/io_numpyro.py:195:34: E0602: Undefined variable 'jax' (undefined-variable)

For the jax import, note that it is not a dependency of ArviZ (nor it should be) so it needs to be imported at runtime from inside the method itself. This is already done in the __init__ method for example: https://github.com/arviz-devs/arviz/blob/main/arviz/data/io_numpyro.py#L67

lucifer4073 · 2024-12-20T10:02:13Z

Hey, I looked at the checks that failed, and they are failing because they can't even find test cases. I don't think that is related to the updated code at all? Let me know if I am missing something, though.

The pylint checks are failing. These are the specific errors:
************* Module arviz.data.io_numpyro
arviz/data/io_numpyro.py:195:64: C0303: Trailing whitespace (trailing-whitespace)
arviz/data/io_numpyro.py:196:0: C0301: Line too long (105/100) (line-too-long)
arviz/data/io_numpyro.py:195:34: E0602: Undefined variable 'jax' (undefined-variable)
For the jax import, note that it is not a dependency of ArviZ (nor it should be) so it needs to be imported at runtime from inside the method itself. This is already done in the __init__ method for example: https://github.com/arviz-devs/arviz/blob/main/arviz/data/io_numpyro.py#L67

You might try this in your terminal.

black arviz/ examples/ asv_benchmarks/

This will format the code according to the benchmark. Once done re-add the changes (git add. -u) and commit them.
Let me know.

OriolAbril

I tried testing this locally on a variation of the model in https://python.arviz.org/en/stable/getting_started/CreatingInferenceData.html#from-numpyro but with random y and sigma with 30k elements plus generating 2k posterior samples.

The version with vmap (after the fixes mentioned in the review) and the current version took basically the same time. The log_likelihood function itself in numpyro calls a soft_vmap so there might not even be any difference between using vmap directly on our side or calling numpyro directly.

I did still crash my computer multiple times with both versions when I attempted running things in a loop to get some average timings which makes me suspect there are memory leaks somewhere in the process which might even be the reason of the slowness.

I am sorry but I don't think it makes sense to merge this before we can get reproducible models that take extremely long with the current version yet run fast with this vmap version

OriolAbril · 2024-12-20T22:38:07Z

arviz/data/io_numpyro.py

@@ -181,20 +181,26 @@ def sample_stats_to_xarray(self):
    @requires("posterior")
    @requires("model")
    def log_likelihood_to_xarray(self):
-        """Extract log likelihood from NumPyro posterior."""
+        """Extract log likelihood from NumPyro posterior using vectorization."""
        if not self.log_likelihood:
            return None


Suggested change

return None

return None

import jax

OriolAbril · 2024-12-20T22:39:30Z

arviz/data/io_numpyro.py

+
+            # Vectorized log likelihood calculation using jax.vmap
+            log_likelihood_dict = jax.vmap(lambda single_sample: 
+                self.numpyro.infer.log_likelihood(self.model, single_sample, *self._args, **self._kwargs)


Suggested change

self.numpyro.infer.log_likelihood(self.model, single_sample, *self._args, **self._kwargs)

self.numpyro.infer.log_likelihood(self.model, single_sample, *self._args, batch_ndims=0, **self._kwargs)

It doesn't work without this because batching is not taken care of directly in vmap but this function expects a batch dimension too and fails when it is not there (or seemingly changes with the different variables)

should have vectorized log_likelihood function for NumPyro, here

1955360

OriolAbril changed the title ~~(WIP?) vectorized log_likelihood function for NumPyro (https://github.com/arviz-devs/arviz/issues/2373)~~ (WIP?) vectorized log_likelihood function for NumPyro Oct 7, 2024

OriolAbril linked an issue Oct 7, 2024 that may be closed by this pull request

Log likelihood computation in numpyro can be extremely slow #2373

Open

OriolAbril requested changes Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP?) vectorized log_likelihood function for NumPyro #2390

(WIP?) vectorized log_likelihood function for NumPyro #2390

aporsch1 commented Oct 3, 2024 •

edited by github-actions bot

Loading

aporsch1 commented Oct 3, 2024

OriolAbril commented Oct 7, 2024

OriolAbril commented Oct 7, 2024

lucifer4073 commented Dec 20, 2024

OriolAbril left a comment

OriolAbril Dec 20, 2024

OriolAbril Dec 20, 2024

	self.numpyro.infer.log_likelihood(self.model, single_sample, self._args, *self._kwargs)
	self.numpyro.infer.log_likelihood(self.model, single_sample, self._args, batch_ndims=0, *self._kwargs)

(WIP?) vectorized log_likelihood function for NumPyro #2390

Are you sure you want to change the base?

(WIP?) vectorized log_likelihood function for NumPyro #2390

Conversation

aporsch1 commented Oct 3, 2024 • edited by github-actions bot Loading

Description

Checklist

aporsch1 commented Oct 3, 2024

OriolAbril commented Oct 7, 2024

OriolAbril commented Oct 7, 2024

lucifer4073 commented Dec 20, 2024

OriolAbril left a comment

Choose a reason for hiding this comment

OriolAbril Dec 20, 2024

Choose a reason for hiding this comment

OriolAbril Dec 20, 2024

Choose a reason for hiding this comment

aporsch1 commented Oct 3, 2024 •

edited by github-actions bot

Loading