Integrating Dask Distributed Computing with Celery for Asynchronous Processing of Large CSV Files

sri-dhurkesh · December 7, 2023, 6:26pm

I’ve been exploring the integration of Dask distributed computing with Celery for asynchronous processing of large CSV files. Could you confirm if this is feasible, or if you have any alternative ideas for achieving this?

I’ve tried to computing inside the celery task. But it throws the error,

  File "/home/julia/conda/envs/dask-dev/lib/python3.9/asyncio/base_events.py", line 814, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/home/julia/conda/envs/dask-dev/lib/python3.9/concurrent/futures/thread.py", line 167, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown

guillaumeeb · December 8, 2023, 2:28pm

Hi @sri-dhurkesh, welcome to Dask Discourse!

The task trace you’re showing doesn’t seem to involve Dask. What are you trying to with Celery? Are you launching a LocalCluster, or connecting a Client to another Dask cluster?

Why do you need Celery at first ?

Topic		Replies	Views
Regarding Dask Queuing when many task assign to worker Distributed distributed	0	189	September 6, 2022
Cancelling dask task and handling of BaseException(asyncio.CancelledError) in a task Distributed distributed	4	2047	February 28, 2022
Error in monitoring progress of distributed work - related to asyncio Distributed distributed	4	249	April 15, 2024
Requested dask.distributed scheduler but no Client active Distributed delayed , distributed	2	315	December 13, 2023
Future does not get computed in Dask PBS cluster Distributed dask-jobqueue , future , distributed	6	217	August 30, 2023

Integrating Dask Distributed Computing with Celery for Asynchronous Processing of Large CSV Files

Related topics