Hi all,
I’m working on analyzing 4-dimensional oceanographic model data using xarray. I understand that xarray leverages Dask for its lazy computation and chunking capabilities. When I run my analysis script in a Jupyter Notebook session allocated with 60 GB of memory, everything executes smoothly without any issues. However, when I attempt to run the exact same script with an equivalent memory allocation (60 GB) through a Slurm job script, the job gets terminated due to memory errors.
I’ve encountered errors such as TimeoutError
, CommClosedError
, and Out Of Memory
from the Slurm scheduler, indicating that the job was killed due to memory constraints.
I’m puzzled as to why the script runs perfectly in the Jupyter environment but faces memory issues when executed via Slurm, even though the memory allocation is the same in both scenarios. Could there be any underlying differences in how Dask handles memory or computations in these two environments? Any insights or suggestions would be greatly appreciated.