Hi everyone, I am currently dealing with a timeseries data and the unit is mircrosecond (us). The example is shown as below:
size
2021-09-01 00:00:00.000001 0
2021-09-01 00:00:00.000004 1
2021-09-01 00:00:00.000007 2
2021-09-01 00:00:00.000010 3
2021-09-01 00:00:00.000013 4
What I need to is to compute the sum of size columns of every moving 2 microseconds (1,2 are in a group, 3,4 are in a group, etc.). The start time will be the timestamp of the first row. As you can see that, there are some missing timestamps, and the size column value is regarded as 0 for those missing timestamps. I currently can use Pandas reindex function and date_range to insert rows for those missing timestamps. After that, I can use rolling window to compute what I want easily. However, since there’s no reindex function in Dask DataFrame, I don’t have an idea how to do this in Dask. Can someone enlight me some ways to implement this feature?
size
2021-09-01 00:00:00.000001 0
2021-09-01 00:00:00.000002 0
2021-09-01 00:00:00.000003 0
2021-09-01 00:00:00.000004 1
2021-09-01 00:00:00.000005 0
2021-09-01 00:00:00.000006 0
2021-09-01 00:00:00.000007 2
2021-09-01 00:00:00.000008 0
2021-09-01 00:00:00.000009 0
2021-09-01 00:00:00.000010 3
2021-09-01 00:00:00.000011 0
2021-09-01 00:00:00.000012 0
2021-09-01 00:00:00.000013 4
The reason why I need to use Dask to do this is because my original dataset is huge, so I want to use Dask to compute.