Hello,
I have a very large dataset of DateTimeIndexed signals that cannot be loaded in memory without using dask.
I am trying to do some feature engineering to derive multiple features from a single column using a rolling window of about “100ms” and other time frames (using fast fourier transform and the original data is in 20ms).
I have looked into rolling but my understanding is that the result type should return a single value according to rolling.apply so it is not usable here.
A solution I found in pandas was to use resample and iterate over the bins generated to create a list of my different features (there is an equivalent in dask here) like so:
f1s, f2s = [], []
bins = result.resample("100ms")
for bin in bins:
# compute new features from column df["X"]
f1 = ....
f2 = ....
f1s.append(f1)
f2s.append(f2)
However, resample in dask doesn't seem to be iterable so this solution is not applicable either.
I would appreciate any advice on how to best solve my problem.
Thank you for you time.