This strange glitch doesn’t occur frequently, but it always happens in the same spot. In case it was a problem with overflow, I checked the same program with uint64, and the same glitch occurred in the same spot.
I was wondering if there was anything that could trigger a change in dtype like this, and how to prevent it?
Actually, my astype function was formatted wrong, it should just be arr.astype(np.uint8)! I think the program was originally changing dtype in pandas, because one of the columns was int64 and the index was uint8, so I think sometimes pandas decides to make everything float!
x = df.merge(df2, how=‘left’, left_index=True, right_index=True).to_dask_array(lengths=True)
x = da.rechunk(x, chunks=‘auto’)
x.compute()
The reason the code wasn’t working was because sometimes rows didn’t match with others and so returned a nan datatype, which is float. You could see how sometimes the merge returns int and others it returns float depending on probability and how high you set the high value when finding the random indexes.
Is there way to have a merge without including these rows?
You are asking for a left join, so every time you’ve got an index on the left that doesn’t exist on the right, you’ll have NaN. Did you try with an inner join if you want to avoid the NaN?