divide by zero error #16

rabernat · 2019-12-16T18:19:24Z

from xhistogram.core import histogram

data = np.zeros(36048471)
bins = np.array([1, 2, 3])
histogram(data, bins=[bins])

gives

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-84-530a0f6cf98c> in <module>
      3 data = np.zeros(36048471)
      4 bins = np.array([1, 2, 3])
----> 5 histogram(data, bins=[bins])

/srv/conda/envs/notebook/lib/python3.7/site-packages/xhistogram/core.py in histogram(bins, axis, weights, density, block_size, *args)
    245     h = _histogram_2d_vectorized(*all_args_reshaped, bins=bins,
    246                                  weights=weights_reshaped,
--> 247                                  density=density, block_size=block_size)
    248 
    249     if h.shape[0] == 1:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xhistogram/core.py in _histogram_2d_vectorized(bins, weights, density, right, block_size, *args)
    124 
    125     bin_counts = _dispatch_bincount(bin_indices, weights, N, hist_shapes,
--> 126                                     block_size=block_size)
    127 
    128     # just throw out everything outside of the bins, as np.histogram does

/srv/conda/envs/notebook/lib/python3.7/site-packages/xhistogram/core.py in _dispatch_bincount(bin_indices, weights, N, hist_shapes, block_size)
     76     # block_chunks is like a dask chunk, a tuple that divides up the first
     77     # axis of bin_indices
---> 78     block_chunks = _determine_block_chunks(bin_indices, block_size)
     79     if len(block_chunks) == 1:
     80         # single global chunk, don't need a loop over chunks

/srv/conda/envs/notebook/lib/python3.7/site-packages/xhistogram/core.py in _determine_block_chunks(bin_indices, block_size)
     64             block_size = min(_MAX_CHUNK_SIZE // N, M)
     65     assert isinstance(block_size, int)
---> 66     num_chunks = M // block_size
     67     block_chunks = num_chunks * (block_size,)
     68     residual = M % block_size

ZeroDivisionError: integer division or modulo by zero

Smaller sized arrays work fine.

The text was updated successfully, but these errors were encountered:

gmacgilchrist · 2020-01-13T15:14:04Z

Hi @rabernat, I'm starting to get this error coming up often.

I'm working with model output with dimension lengths time: 31, zl: 75, yh: 576, xh: 720, selecting subsets in the time dimension prior to applying histogram (to discern speed of calculation before chugging it in for all times) and binning in xh, yh, and zl. If I specify block_size=len(time) it tends to work, but it seems that hard coding block_size comes with its own potential for errors and performance costs.

Have you had any luck getting to the bottom of this?

CptOrange16 · 2021-01-28T10:41:51Z

Hello @gmacgilchrist and @rabernat
I am also seeing this error when working with xarray DataArrays with the dimensions: <xarray.DataArray 'LST' (time: 1, x: 3201, y: 3201)>.
As @gmacgilchrist mentioned, If I set the block_size=data.time.size, it does not give any error.
I had used large arrays previously, with dimensions like xarray.DataArray 'LST' (time: 980, x: 3201, y: 3201)>. and everything worked with no errors.

rabernat · 2021-01-28T14:07:58Z

I think I found that the problem went away if I chunked the arrays with dask. So that is one temporary workaround.

Are you using dask arrays, or are you calling this on a "loaded" numpy array?

CptOrange16 · 2021-01-28T18:54:47Z

@rabernat Not a dask array explicitly, but DataArrays from xArray should be more like dask arrays no?
Basically i load a dataset from a netcdf file into a DataArray and then perform some simple operationons like a mean, standard deviation and such.

jbusecke · 2021-08-26T23:21:43Z

I just ran into this, and somehow the error is triggered based on the amount of dimensions given in dim (which probably somehow influence block_size I guess?)
I boiled it down to this example:

import xarray as xr
import numpy as np
import dask.array as dsa
from xhistogram.xarray import histogram
da = xr.DataArray(dsa.random.random((12, 35, 576, 720), chunks=(3, 35, 576, 720)), dims=['time','z', 'y','x'])
da

bins = np.arange(-0.5, 0.5, 0.1)
count = histogram(da, bins=bins, dim=['x','y', 'z'])
count

Loading the dataarray into memory triggers the error

count.load()

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
/tmp/ipykernel_17059/2516122562.py in <module>
----> 1 count.load()

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/xarray/core/dataarray.py in load(self, **kwargs)
    927         dask.compute
    928         """
--> 929         ds = self._to_temp_dataset().load(**kwargs)
    930         new = self._from_temp_dataset(ds)
    931         self._variable = new._variable

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/xarray/core/dataset.py in load(self, **kwargs)
    863 
    864             # evaluate all the dask arrays simultaneously
--> 865             evaluated_data = da.compute(*lazy_data.values(), **kwargs)
    866 
    867             for k, data in zip(lazy_data, evaluated_data):

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs)
    566         postcomputes.append(x.__dask_postcompute__())
    567 
--> 568     results = schedule(dsk, keys, **kwargs)
    569     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    570 

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     77             pool = MultiprocessingPoolExecutor(pool)
     78 
---> 79     results = get_async(
     80         pool.submit,
     81         pool._max_workers,

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    512                             _execute_task(task, data)  # Re-execute locally
    513                         else:
--> 514                             raise_exception(exc, tb)
    515                     res, worker_id = loads(res_info)
    516                     state["cache"][key] = res

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/local.py in reraise(exc, tb)
    323     if exc.__traceback__ is not tb:
    324         raise exc.with_traceback(tb)
--> 325     raise exc
    326 
    327 

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    221     try:
    222         task, data = loads(task_info)
--> 223         result = _execute_task(task, data)
    224         id = get_id()
    225         result = dumps((result, id))

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/core.py in <genexpr>(.0)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/optimization.py in __call__(self, *args)
    967         if not len(args) == len(self.inkeys):
    968             raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args)))
--> 969         return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
    970 
    971     def __reduce__(self):

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/core.py in get(dsk, out, cache)
    149     for key in toposort(dsk):
    150         task = dsk[key]
--> 151         result = _execute_task(task, cache)
    152         cache[key] = result
    153     result = _execute_task(out, cache)

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/core.py in <genexpr>(.0)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    113     """
    114     if isinstance(arg, list):
--> 115         return [_execute_task(a, cache) for a in arg]
    116     elif istask(arg):
    117         func, args = arg[0], arg[1:]

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/core.py in <listcomp>(.0)
    113     """
    114     if isinstance(arg, list):
--> 115         return [_execute_task(a, cache) for a in arg]
    116     elif istask(arg):
    117         func, args = arg[0], arg[1:]

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/dask/utils.py in apply(func, args, kwargs)
     33 def apply(func, args, kwargs=None):
     34     if kwargs:
---> 35         return func(*args, **kwargs)
     36     else:
     37         return func(*args)

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/xhistogram/core.py in _bincount(weights, axis, bins, density, block_size, *all_arrays)
    234         weights_array = None
    235 
--> 236     bin_counts = _bincount_2d_vectorized(
    237         *all_arrays_reshaped,
    238         bins=bins,

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/xhistogram/core.py in _bincount_2d_vectorized(bins, weights, density, right, block_size, *args)
    183     N = reduce(lambda x, y: x * y, hist_shapes)
    184 
--> 185     bin_counts = _dispatch_bincount(
    186         bin_indices, weights, N, hist_shapes, block_size=block_size
    187     )

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/xhistogram/core.py in _dispatch_bincount(bin_indices, weights, N, hist_shapes, block_size)
    127     # block_chunks is like a dask chunk, a tuple that divides up the first
    128     # axis of bin_indices
--> 129     block_chunks = _determine_block_chunks(bin_indices, block_size)
    130     if len(block_chunks) == 1:
    131         # single global chunk, don't need a loop over chunks

/scratch/gpfs2/jbusecke/conda_tigressdata/envs/cmip6_omz/lib/python3.9/site-packages/xhistogram/core.py in _determine_block_chunks(bin_indices, block_size)
    115             block_size = min(_MAX_CHUNK_SIZE // N, M)
    116     assert isinstance(block_size, int)
--> 117     num_chunks = M // block_size
    118     block_chunks = num_chunks * (block_size,)
    119     residual = M % block_size

ZeroDivisionError: integer division or modulo by zero

Now if I just count over two instead of 3 dimensions it works!

# Now without counting over z
histogram(da, bins=bins, dim=['x','y']).load()

As a user that completely confuses me. Note that this did NOT go away by chunking the array as some earlier posts suggested? Do you think this is the same problem?

Happy to debug further, but ultimately just thought this might be another test example that could be useful as a test case for future refactors (xarray-contrib/xskillscore#334 (comment)).

gmacgilchrist · 2021-08-27T21:01:32Z

This is indeed the same as the issue that I was having, which I found to be sensitive to which dimensions I was performing the histogram over and, critically, how many I left behind. If I remember correctly, this divide by zero error was occurring whenever I was attempting to histogram over all dimensions except 1, and would go away if I counted over more or fewer dimensions, which is consistent with what you're seeing. I didn't yet test the impacts of chunks.

dougiesquire · 2021-08-30T00:07:33Z

This is really helpful info @jbusecke and @gmacgilchrist! I think this issue should be quite easy to fix once we decide what we want to do with the block_size argument.

@TomNicholas, did some testing of the effect of changing this argument here: #63 and there have been suggestions to just hard code it as in numpy. I'd be happy to dive into this further, or @TomNicholas is this something you've already resolved as part of the move of xhistogram into xarray?

iuryt · 2022-02-24T18:48:56Z

I had the same problem with a large xarray.Dataset and It worked after chunking the data.

iuryt · 2022-02-25T16:44:52Z

Actually, it worked for me even if I have only one chunk.

zxdawn · 2022-06-27T12:08:56Z

PS: histogram(data1, data2, bins=[bin1, bin2], dim=['longitude', 'latitude']).sum(dim='time') works for me.

jbusecke · 2022-11-02T00:02:44Z

Just ran again into this

This is really helpful info @jbusecke and @gmacgilchrist! I think this issue should be quite easy to fix once we decide what we want to do with the block_size argument.

@TomNicholas, did some testing of the effect of changing this argument here: #63 and there have been suggestions to just hard code it as in numpy. I'd be happy to dive into this further, or @TomNicholas is this something you've already resolved as part of the move of xhistogram into xarray?

Was there any progress or a suggested fix here? Just ran into this again with a large dataset.

Here is the shape/chunking:

And it again fails when I want to compute the histogram over all dimensions. I can easily get around this by only taking the hist along the unchunked dims and then summing in time, but a fix for this would still be very desirable IMO.

AxelHenningsson · 2023-02-26T23:06:15Z

+1 to fixing this.

iuryt · 2023-03-02T03:06:08Z

As @rabernat mentioned in the original post.
It depends on the dataset size.

S = slice(None,None)
bins = [np.arange(-10,10,0.1), np.arange(0,2000,2)]
Ha = histogram(anomaly.isel(casts=S), anomaly.isel(casts=S).z, bins=bins)

anomaly has 3567 casts
The code above gives divide by zero error if S=slice(None,None) and S=slice(None,3000).
It doesn't error for S=slice(None,2000) or S=slice(2000,3000).
So, this doesn't seem to be an error with some of the casts.

If I chunk and load, it doesn't error.

bins = [np.arange(-10,10,0.1), np.arange(0,2000,2)]
Ha = histogram(anomaly.chunk(casts=1), anomaly.chunk(casts=1).z, bins=bins).load()

I just noticed that I am not adding anything new to the problem, but will keep this post here anyway. haha

rabernat · 2023-03-02T13:31:25Z

Hi everyone! We would absolutely welcome a PR to fix this long-standing bug.

iuryt · 2023-03-02T23:12:31Z

Is anyone working on that right now or have an idea on how to fix that?
I can try to help, but don't know where to start.

TomNicholas · 2023-06-09T14:48:50Z

Is anyone working on that right now or have an idea on how to fix that?
I can try to help, but don't know where to start.

@iuryt the problem is in the _determine_block_chunks function. For some reason block_size is being set to 0, causing a divide by zero error in the line num_chunks = M // block_size. It doesn't make sense to have a block size of 0, so one way forward would be to add a print statement/raise an error there then try different inputs and work out what exactly causes it to be set to 0.

Alternatively one could just remove the whole block_size optimisation completely from _dispatch_bincount, so the buggy code is never called. @rabernat do you have any recollection of how important that bit of code actually is for performance?

In the meantime one can just turn off the block-level optimization by passing block_size=None. This skips the buggy logic entirely, and should work for any array, but comes with some (unknown) performance hit.

- See related comments in hdrake/sectionate@39c3009 - Temporary bug-fix for xhistogram call (see xgcm/xhistogram#16) - Re-ran all notebooks

lluritu · 2023-12-27T12:59:12Z

Hi, I came across this same problem and have figured out the origin, at least in my case.
When block_size="auto" in _determine_block_chunks()
if N is larger than _MAX_CHUNK_SIZE the integer division in line 115 yields zero, leading to divide by zero later in 117.
Setting block_size=None or rechunking the data bypasses the issue.
I guess the "heuristic without much basis" needs to be changed to something better.

pedroaugustosmribeiro · 2024-02-15T21:11:46Z

I am also having problems with this hard-coded heuristic when using with xskillscore package

dougiesquire mentioned this issue Jun 28, 2021

ZeroDivisionError for large arrays in rank_histogram because block_size is zero xarray-contrib/xskillscore#334

Closed

rabernat mentioned this issue Oct 12, 2022

Division by zero error with SOSE data #83

Closed

hdrake added a commit to hdrake/xwmt that referenced this issue Sep 2, 2023

Update xgcm.Grid API

89f4619

- See related comments in hdrake/sectionate@39c3009 - Temporary bug-fix for xhistogram call (see xgcm/xhistogram#16) - Re-ran all notebooks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

divide by zero error #16

divide by zero error #16

rabernat commented Dec 16, 2019

gmacgilchrist commented Jan 13, 2020

CptOrange16 commented Jan 28, 2021

rabernat commented Jan 28, 2021

CptOrange16 commented Jan 28, 2021

jbusecke commented Aug 26, 2021

gmacgilchrist commented Aug 27, 2021

dougiesquire commented Aug 30, 2021

iuryt commented Feb 24, 2022

iuryt commented Feb 25, 2022

zxdawn commented Jun 27, 2022

jbusecke commented Nov 2, 2022 •

edited

Loading

AxelHenningsson commented Feb 26, 2023

iuryt commented Mar 2, 2023 •

edited

Loading

rabernat commented Mar 2, 2023

iuryt commented Mar 2, 2023 •

edited

Loading

TomNicholas commented Jun 9, 2023

lluritu commented Dec 27, 2023

pedroaugustosmribeiro commented Feb 15, 2024

divide by zero error #16

divide by zero error #16

Comments

rabernat commented Dec 16, 2019

gmacgilchrist commented Jan 13, 2020

CptOrange16 commented Jan 28, 2021

rabernat commented Jan 28, 2021

CptOrange16 commented Jan 28, 2021

jbusecke commented Aug 26, 2021

gmacgilchrist commented Aug 27, 2021

dougiesquire commented Aug 30, 2021

iuryt commented Feb 24, 2022

iuryt commented Feb 25, 2022

zxdawn commented Jun 27, 2022

jbusecke commented Nov 2, 2022 • edited Loading

AxelHenningsson commented Feb 26, 2023

iuryt commented Mar 2, 2023 • edited Loading

rabernat commented Mar 2, 2023

iuryt commented Mar 2, 2023 • edited Loading

TomNicholas commented Jun 9, 2023

lluritu commented Dec 27, 2023

pedroaugustosmribeiro commented Feb 15, 2024

jbusecke commented Nov 2, 2022 •

edited

Loading

iuryt commented Mar 2, 2023 •

edited

Loading

iuryt commented Mar 2, 2023 •

edited

Loading