Dask Code Runs in Jupyter Notebook but Fails in SLURM Job Script

Hi all,
I’m working on analyzing 4-dimensional oceanographic model data using xarray. I understand that xarray leverages Dask for its lazy computation and chunking capabilities. When I run my analysis script in a Jupyter Notebook session allocated with 60 GB of memory, everything executes smoothly without any issues. However, when I attempt to run the exact same script with an equivalent memory allocation (60 GB) through a Slurm job script, the job gets terminated due to memory errors.

I’ve encountered errors such as TimeoutError, CommClosedError, and Out Of Memory from the Slurm scheduler, indicating that the job was killed due to memory constraints.

I’m puzzled as to why the script runs perfectly in the Jupyter environment but faces memory issues when executed via Slurm, even though the memory allocation is the same in both scenarios. Could there be any underlying differences in how Dask handles memory or computations in these two environments? Any insights or suggestions would be greatly appreciated.

I’m not an expert at all, but I’ve come across a few difference between IPython (jupyter notebooks) and Python for running asynchronous code which Dask using a lot of.

Have you ran the two environments running the same code through a memory profiler to see where the bottleneck is?

1 Like

Hi @Sumanshekhar17, welcome to Dask community!

  • Which kind of Dask Scheduler are you using, do you create a Distributed cluster?
  • How the Jupyter notebook environment differs from the batch script one: do you run Jupyter inside a batch script? Is it started by other mean? Is it on the same kind of servers?