Hi everyone,
please consider the following code snippet, where I create an array mask and use this mask to extract the indices of items matching my query. These indices are then permutated to form a random order:
raw = np.arange(4, dtype=np.int32).repeat(4)
arr = da.from_array(raw)
masked_array = da.ma.masked_equal(arr, 1)
permutation = da.random.permutation(masked_array.nonzero()[0])
Running this code results in the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
undefined in <module>
2 arr = da.from_array(raw)
3 masked_array = da.ma.masked_equal(arr, 1)
----> 4 permutation = da.random.permutation(masked_array.nonzero()[0])
~\Miniconda3\envs\venv\lib\site-packages\dask\array\random.py in permutation(self, x)
363 x = arange(x, chunks="auto")
364
--> 365 index = np.arange(len(x))
366 self._numpy_state.shuffle(index)
367 return shuffle_slice(x, index)
TypeError: 'float' object cannot be interpreted as an integer
I get no error if I do not call .nonzero()[0] on my masked_array, but then the entire array is permutated.
The issue is that in my original code the used array has ~ 3mio entries, but the valid entries are only ~300-500 items. So permutating only a small subset of 300 samples is therefore more efficient.
Also, I do not really understand the resulting error. My initial guess is that dask can not perform the operation as the resulting size/type etc. of the selection is unknown.
Any help is appreciated, thanks in advance.
Best,
Jan