Multi column groupby gives RuntimeError with latest Dask version 2024.2.1

Running a slurmcluster with 5 nodes. while performing a two column groupby aggregation e.g df.groupby([‘num_col1’, ‘num_col2’])[‘num_col3’].sum().compute() has been giving me RunTimeError: Expected <TaskState (‘shuffle-transfer-7c************’,17) released> to be processing, is released.

Hi @askDask, welcome to Dask community!

I understand you are having problem with a computation that worked previously on a new Dask version?

Could you please try to share a reproducer, using a LocalCluster instead of a SLURMCluster?

Hi @guillaumeeb. Thanks for prompt reply. I am going to have to get back to you on the reproducer but in the meantime I have been going through the painful “process of elimination” to narrow down the issue here and what I noticed is that if I run compute() on any one partition to get the memory usage or the head(), and then run groupby, I see the error otherwise I don’t. The steps are -

  1. df = df.repartition(partition_size=“256MB”).persist()
  2. df = df.persist()
  3. df.partitions[0].memory_usage(deep=True).compute.sum()
  4. df.partitions[0].compute().head() and THEN do
  5. df.groupby([‘num_col1’, ‘num_col2’])[‘num_col3’].sum().compute(), I get the error.

So you get the error only when performing some computations before running the final groupby. Unfortunatly, it will be hard to help or understand what the problem is without a reproducer. You don’t encounter this issue with older Dask versions?

The unusual behavior I was noticing might have had something to do with version 2024.2.0 which was updated two days later with 2024.2.1 and all of the problems went away. I have since updated to the most recent version 2024.4.1 and I don’t see any problems with multi column groupby other than having to make sure that the data is already pre-sorted when I get the parquet files for terabyte sized datasets . That way I don’t have to make dask do the heavy lifting of sorting and creating divisions which invariably fails during shuffle transfers. You may close this one. Sorry about the delayed response (pun intended). Regards

1 Like