I’m trying to figure out why the rechunk()
method, with balance=True
, does not lead to balanced chunks when it’s pretty obvious how to balance them.
import numpy as np
import dask.array as da
# Millions of random integers
numbers = np.random.randint(
low = 2, high = 200, size = int(1e8), dtype = np.int16).reshape((10000, 10000))
d_numbers = da.from_array(numbers).rechunk(balance = True)
>>> d_numbers.chunks
((8192, 1808), (8192, 1808))
In some cases I even get a UserWarning chunk size balancing not possible with given chunks.
This, despite the fact that the chunks can very clearly be balanced by:
>>> d_numbers.rechunk((5000,5000)).chunks
((5000, 5000), (5000, 5000))
I’m teaching dask in a Python course, so I’m looking not just for a solution to get balanced chunks (I have that, above) but a way to explain why balance=True
is not working here. Thanks!