Dask Code Runs in Jupyter Notebook but Fails in SLURM Job Script

Sumanshekhar17 · July 31, 2023, 9:34pm

Hi all,
I’m working on analyzing 4-dimensional oceanographic model data using xarray. I understand that xarray leverages Dask for its lazy computation and chunking capabilities. When I run my analysis script in a Jupyter Notebook session allocated with 60 GB of memory, everything executes smoothly without any issues. However, when I attempt to run the exact same script with an equivalent memory allocation (60 GB) through a Slurm job script, the job gets terminated due to memory errors.

I’ve encountered errors such as TimeoutError, CommClosedError, and Out Of Memory from the Slurm scheduler, indicating that the job was killed due to memory constraints.

I’m puzzled as to why the script runs perfectly in the Jupyter environment but faces memory issues when executed via Slurm, even though the memory allocation is the same in both scenarios. Could there be any underlying differences in how Dask handles memory or computations in these two environments? Any insights or suggestions would be greatly appreciated.

benrutter · August 1, 2023, 12:03pm

I’m not an expert at all, but I’ve come across a few difference between IPython (jupyter notebooks) and Python for running asynchronous code which Dask using a lot of.

Have you ran the two environments running the same code through a memory profiler to see where the bottleneck is?

guillaumeeb · August 2, 2023, 12:35pm

Hi @Sumanshekhar17, welcome to Dask community!

Which kind of Dask Scheduler are you using, do you create a Distributed cluster?
How the Jupyter notebook environment differs from the batch script one: do you run Jupyter inside a batch script? Is it started by other mean? Is it on the same kind of servers?

Topic		Replies	Views
Memory Management of Dask Cluster and a few new user questions Distributed distributed	15	1070	March 13, 2024
Unexpected Dask cluster behavior on docker setup Deploying Dask docker	9	486	February 23, 2022
Testing lazy evaluation of task graphs Distributed distributed	2	165	February 2, 2023
Image Segmentation Using Large Dask Array Dask Array zarr , xarray , distributed	11	207	August 31, 2023
Why my memory blows up even before the task starts to run? Dask Array dask-array	1	142	August 17, 2023

Dask Code Runs in Jupyter Notebook but Fails in SLURM Job Script

Related Topics