Everything about multiindex in dask

I heard dask does not have very good support for multiindex, but I don’t know exactly how.

So I want to know:
is that still the case?
What do I need to be careful with when dealing with multiindex in Dask?
(e.g. if I want to create partitions by multiple columns)

I’d be happy to learn about the design and story/history.

@ubw218 Thanks for this question!

Unfortunately, right now, the best approach would be to avoid it entirely. Here’s the issue discussing this: Full support for multiindex in dataframes · Issue #1493 · dask/dask · GitHub. I’d encourage you to chime in to support this use case! Also, here’s the most recent effort if you’re interested: [WIP] MultiIndex by jsignell · Pull Request #8153 · dask/dask · GitHub

1 Like