I wonder is there any efficient way to build up a new dask array by repeating a base array for multiple times and I want each row of the new array is a chunk itself . Say I want to form a 2-d array by stacking 1-d numpy array. I am currently doing it in a numpy style way
np_a = np.ones(10_000) # dummy example da_a = da.from_array(np_a, chunks=(10_000, )) stacked_a = da.repeat(da_a[None:, ], repeats=200, axis=0) # up to this point >>> da_a.numblocks >>> (1, 1) # I need to do a further rechunk to get the configuration I want stacked_a = stacked_a.rechunk((200, 10_000))
I actually thought
stacked_a will naturally be of chunk size (1, 10_000), but it turns out not to be the case. And I have looked at the task graph of
stacked_a, which is a horribly wide graph. I guess this might be the bottleneck of my project. My scheduler’s memory always spikes up after running for a while. Is this huge task graph can be a possible reason for that?
Thanks in advance