I have a computation that operates on slices from a dask array and creates a data frame per slice. I’m currently using delayed and dd.from_delayed (which might be deprecated in the future) to create a data frame. Is their a better approach either using dd.from_map or another method? Thanks.
import dask.array as da
import dask.dataframe as dd
import numpy as np
import pandas as pd
from dask import delayed
def test_func(x):
# create a dataframe from slice. In actuality, function is much more complicated
return pd.DataFrame(data=dict(mean=[np.mean(x)]))
array = da.random.random((100, 100), chunks=(10, 10))
slices = [(slice(0, 2), slice(10, 12)), (slice(10, 20), slice(1, 12))]
delayed_results = []
for sl in slices:
delayed_results.append(delayed(test_func)(array[sl]))
df = dd.from_delayed(delayed_results).compute()