Integrating Dask Distributed Computing with Celery for Asynchronous Processing of Large CSV Files

I’ve been exploring the integration of Dask distributed computing with Celery for asynchronous processing of large CSV files. Could you confirm if this is feasible, or if you have any alternative ideas for achieving this?

I’ve tried to computing inside the celery task. But it throws the error,

  File "/home/julia/conda/envs/dask-dev/lib/python3.9/asyncio/base_events.py", line 814, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/home/julia/conda/envs/dask-dev/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown

Hi @sri-dhurkesh, welcome to Dask Discourse!

The task trace you’re showing doesn’t seem to involve Dask. What are you trying to with Celery? Are you launching a LocalCluster, or connecting a Client to another Dask cluster?

Why do you need Celery at first ?