Any recommendation on dealing with ragged data return in dask.delayed

Do you have any experience in dealing with ragged data return in dask.delayed?

I utilize dask.distributed and dask.delayed to distribute the workload (see the script below). The return from the function (func) is a ragged data with multiple arrays having different dimensions. Right now I get all the arrays in a single tuple (results). Is there a nicer way where I get the corresponding arrays be stacked nicely as one would get in Dask.dataframe.


with Client(cluster) as client:
delayed_results = [ delayed(func)(x_train, y_train, idx) for idx in contexts]
results=compute(*delayed_results, scheduler=“processes”))

Hi @baltun,

I’m a developer on the Awkward Array project, and a user of Dask for Physics analyses.

Awkward Array is designed to work with ragged arrays. There’s currently work on a Dask extension that allows the high-level Awkward Array API to run over dask. It’s very early days though, so it might not be easy to use for now.

Could you describe the shapes of your arrays and what shape you want the results to be? Is the result also ragged?