I am wondering why repartition(freq='24h')
is resulting in round divisions.
Dask dataframe with divisions aligned on 12:00:00
df = dask.datasets.timeseries().compute() df.index += pd.to_timedelta('12:00:00') dd = dask.dataframe.from_pandas(df, npartitions=15) dd
repartition(freq=‘24h’) is resulting in round divisions:
dd.repartition(freq='24h')
Expected result
Same happens with '1d'
and pd.to_timedelta('1d')
because Dask repartition_freq()
explicitly ceils the first division, but I am unable to understand why it’s a good idea, and how can I bypass this?
def repartition_freq(df, freq=None):
[...]
try:
start = df.divisions[0].ceil(freq)
except ValueError:
start = df.divisions[0]