-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in use with dask-awkward #155
Comments
Dmitry Kalinkin ***@***.***> writes:
```python
import boost_histogram as bh
import dask_awkward as dak
import dask_histogram as dh
arr = dak.from_lists([list(range(10))] * 3)
axis = bh.axis.Regular(10, 0., 10.)
hist = dh.factory(
arr,
axes=(axis,),
).compute()
print(hist.values())
```
Works on dask 2024.11.2, distributed 2024.11.2 and dask-awkward 2024.9.0:
```output
[3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]
```
Broken with dask 2024.12.0, distributed 2024.12.0 and dask-awkward 2024.12.1:
```python
Traceback (most recent call last):
File "/tmp/regr/regr.py", line 10, in <module>
).compute()
^^^^^^^^^
File "/nix/store/nbqgvfbciwpxdazx5zn72i31wckkp5ad-python3.12-dask-2024.12.0/lib/python3.12/site-packages/dask/base.py", line 372, in compute
(result,) = compute(self, traverse=False, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/nbqgvfbciwpxdazx5zn72i31wckkp5ad-python3.12-dask-2024.12.0/lib/python3.12/site-packages/dask/base.py", line 660, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/9x4i6b87rx9w6y26732msn50rnn6fbr8-python3.12-dask-histogram-2024.9.1/lib/python3.12/site-packages/dask_histogram/core.py", line 290, in _blocked_dak
return thehist.fill(thedata, weight=theweights, sample=thesample)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/3h45q4z2cg38204ja42696kbik3j2wxb-python3.12-boost-histogram-1.5.0/lib/python3.12/site-packages/boost_histogram/_internal/hist.py", line 511, in fill
self._hist.fill(*args_ars, weight=weight_ars, sample=sample_ars) # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Wrong number of args
```
Likely due to blockwise changes that just made it into dask-awkward
(which were made to be compatible with the new Task spec in upstream
Dask). Blockwise is responsible for unpacking arguments, and it looks
like `args_ars` in that call to a boost-histogram function is
incorrectly packed by the blockwise layer in the graph.
The `_partitionwise` function in core.py is probably the place that
needs patching (that's where dask.blockwise is used).
|
Unfortunately only interactions with delayed objects seemed to need any wrapping with TaskRef or the like... I'll take a further look into it tomorrow. |
@veprbl just released dask-awkward 2024.12.0, should be good to go now. Thanks @douglasdavis! |
I can confirm that this solves the example above, but there are more issues. Consider: import boost_histogram as bh
import dask_awkward as dak
import dask_histogram as dh
arr = dak.from_lists([list(range(10))] * 3)
axis = bh.axis.Regular(10, 0., 10.)
hist = dh.factory(
arr,
axes=(axis,),
weights=arr,
).compute()
print(hist.values()) Should give:
but instead gives: Traceback (most recent call last):
File "this.py", line 11, in <module>
).compute()
^^^^^^^^^
File "/nix/store/nbqgvfbciwpxdazx5zn72i31wckkp5ad-python3.12-dask-2024.12.0/lib/python3.12/site-packages/dask/base.py", line 372, in compute
(result,) = compute(self, traverse=False, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/nbqgvfbciwpxdazx5zn72i31wckkp5ad-python3.12-dask-2024.12.0/lib/python3.12/site-packages/dask/base.py", line 660, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/qm9y3w8x8j4zfyh4jh5843bdnv0x901x-python3.12-dask-histogram-2024.12.0/lib/python3.12/site-packages/dask_histogram/core.py", line 290, in _blocked_dak
return thehist.fill(thedata, weight=theweights, sample=thesample)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/3h45q4z2cg38204ja42696kbik3j2wxb-python3.12-boost-histogram-1.5.0/lib/python3.12/site-packages/boost_histogram/_internal/hist.py", line 504, in fill
weight_ars = _fill_cast(weight)
^^^^^^^^^^^^^^^^^^
File "/nix/store/3h45q4z2cg38204ja42696kbik3j2wxb-python3.12-boost-histogram-1.5.0/lib/python3.12/site-packages/boost_histogram/_internal/hist.py", line 82, in _fill_cast
return np.asarray(value)
^^^^^^^^^^^^^^^^^
File "/nix/store/6r7ij4cd6fgzmpcs25iw4p7p18z3c2rs-python3.12-dask-awkward-2024.12.1/lib/python3.12/site-packages/dask_awkward/lib/core.py", line 1718, in __array__
raise NotImplementedError
NotImplementedError |
Yeah, the approach using a partial doesn't cover all cases, just |
Ah that’s right, the if statements need to be there because the collections need to still resolve to collections (the partial kills the collection discovery in dask-awkward), this is why I originally wrote all the specific function implementations to begin with, it’s an easy fix but yields a decent amount of code duplication. I’ll write up the fix later today.
…On Tue, Dec 17, 2024, at 13:18, Lindsey Gray wrote:
Yeah, the approach using a partial doesn't cover all cases, just `None`s.
—
Reply to this email directly, view it on GitHub
<#155 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYNYKUEQQWAX3726OL6KZT2GB2GRAVCNFSM6AAAAABTXLE2L6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBZGQYDEMRVGY>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
So the real issue is that |
And it's specifically happening during the optimization pass. (i.e. these |
Yeah, blockwise optimization has gotten more liberal with culling (most things that don't have a TaskRef will get removed in optimization!!) and we have to be much more careful what's actually exposed in the graph. I think, however, it's a good thing in the long run. culling especially touchy because of things like: https://github.com/dask/dask/blob/main/dask/blockwise.py#L675-L684 |
Closing again with #157 in |
Works on dask 2024.11.2, distributed 2024.11.2 and dask-awkward 2024.9.0:
Broken with dask 2024.12.0, distributed 2024.12.0 and dask-awkward 2024.12.1:
The text was updated successfully, but these errors were encountered: