Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in use with dask-awkward #155

Closed
veprbl opened this issue Dec 17, 2024 · 10 comments · Fixed by #156
Closed

Regression in use with dask-awkward #155

veprbl opened this issue Dec 17, 2024 · 10 comments · Fixed by #156

Comments

@veprbl
Copy link
Contributor

veprbl commented Dec 17, 2024

import boost_histogram as bh
import dask_awkward as dak
import dask_histogram as dh

arr = dak.from_lists([list(range(10))] * 3)
axis = bh.axis.Regular(10, 0., 10.)
hist = dh.factory(
    arr,
    axes=(axis,),
).compute()
print(hist.values())

Works on dask 2024.11.2, distributed 2024.11.2 and dask-awkward 2024.9.0:

[3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]

Broken with dask 2024.12.0, distributed 2024.12.0 and dask-awkward 2024.12.1:

Traceback (most recent call last):
  File "/tmp/regr/regr.py", line 10, in <module>
    ).compute()
      ^^^^^^^^^
  File "/nix/store/nbqgvfbciwpxdazx5zn72i31wckkp5ad-python3.12-dask-2024.12.0/lib/python3.12/site-packages/dask/base.py", line 372, in compute
    (result,) = compute(self, traverse=False, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/nbqgvfbciwpxdazx5zn72i31wckkp5ad-python3.12-dask-2024.12.0/lib/python3.12/site-packages/dask/base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/9x4i6b87rx9w6y26732msn50rnn6fbr8-python3.12-dask-histogram-2024.9.1/lib/python3.12/site-packages/dask_histogram/core.py", line 290, in _blocked_dak
    return thehist.fill(thedata, weight=theweights, sample=thesample)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/3h45q4z2cg38204ja42696kbik3j2wxb-python3.12-boost-histogram-1.5.0/lib/python3.12/site-packages/boost_histogram/_internal/hist.py", line 511, in fill
    self._hist.fill(*args_ars, weight=weight_ars, sample=sample_ars)  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Wrong number of args
@douglasdavis
Copy link
Collaborator

douglasdavis commented Dec 17, 2024 via email

@lgray
Copy link
Collaborator

lgray commented Dec 17, 2024

Unfortunately only interactions with delayed objects seemed to need any wrapping with TaskRef or the like... I'll take a further look into it tomorrow.

@lgray
Copy link
Collaborator

lgray commented Dec 17, 2024

@veprbl just released dask-awkward 2024.12.0, should be good to go now. Thanks @douglasdavis!

@veprbl
Copy link
Contributor Author

veprbl commented Dec 17, 2024

I can confirm that this solves the example above, but there are more issues. Consider:

import boost_histogram as bh
import dask_awkward as dak
import dask_histogram as dh

arr = dak.from_lists([list(range(10))] * 3)
axis = bh.axis.Regular(10, 0., 10.)
hist = dh.factory(
    arr,
    axes=(axis,),
    weights=arr,
).compute()
print(hist.values())

Should give:

[ 0.  3.  6.  9. 12. 15. 18. 21. 24. 27.]

but instead gives:

Traceback (most recent call last):
  File "this.py", line 11, in <module>
    ).compute()
      ^^^^^^^^^
  File "/nix/store/nbqgvfbciwpxdazx5zn72i31wckkp5ad-python3.12-dask-2024.12.0/lib/python3.12/site-packages/dask/base.py", line 372, in compute
    (result,) = compute(self, traverse=False, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/nbqgvfbciwpxdazx5zn72i31wckkp5ad-python3.12-dask-2024.12.0/lib/python3.12/site-packages/dask/base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/qm9y3w8x8j4zfyh4jh5843bdnv0x901x-python3.12-dask-histogram-2024.12.0/lib/python3.12/site-packages/dask_histogram/core.py", line 290, in _blocked_dak
    return thehist.fill(thedata, weight=theweights, sample=thesample)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/3h45q4z2cg38204ja42696kbik3j2wxb-python3.12-boost-histogram-1.5.0/lib/python3.12/site-packages/boost_histogram/_internal/hist.py", line 504, in fill
    weight_ars = _fill_cast(weight)
                 ^^^^^^^^^^^^^^^^^^
  File "/nix/store/3h45q4z2cg38204ja42696kbik3j2wxb-python3.12-boost-histogram-1.5.0/lib/python3.12/site-packages/boost_histogram/_internal/hist.py", line 82, in _fill_cast
    return np.asarray(value)
           ^^^^^^^^^^^^^^^^^
  File "/nix/store/6r7ij4cd6fgzmpcs25iw4p7p18z3c2rs-python3.12-dask-awkward-2024.12.1/lib/python3.12/site-packages/dask_awkward/lib/core.py", line 1718, in __array__
    raise NotImplementedError
NotImplementedError

@lgray lgray reopened this Dec 17, 2024
@lgray
Copy link
Collaborator

lgray commented Dec 17, 2024

Yeah, the approach using a partial doesn't cover all cases, just Nones.

@douglasdavis
Copy link
Collaborator

douglasdavis commented Dec 17, 2024 via email

@douglasdavis
Copy link
Collaborator

douglasdavis commented Dec 18, 2024

So the real issue is that histref is not getting propagated through the graph. Probably something to do with the new Task spec. Still digging.

@douglasdavis
Copy link
Collaborator

douglasdavis commented Dec 18, 2024

And it's specifically happening during the optimization pass. (i.e. these compute(optimize_graph=False) works.)

@lgray
Copy link
Collaborator

lgray commented Dec 18, 2024

Yeah, blockwise optimization has gotten more liberal with culling (most things that don't have a TaskRef will get removed in optimization!!) and we have to be much more careful what's actually exposed in the graph. I think, however, it's a good thing in the long run.

culling especially touchy because of things like: https://github.com/dask/dask/blob/main/dask/blockwise.py#L675-L684

@douglasdavis
Copy link
Collaborator

Closing again with #157 in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants