Where does Dask spill data to on Linux? Can I control this?
I am attempting to concat two large arrays on a local machine and I get the following error:
OSError: [Errno 28] No space left on device
There is however plenty of space on the disk, except for a couple of small partitions like the tmpfs, but I can’t find anything in the trace or in the docs about where desk tries to spill to disk.
Good question @conifer. Each worker has its own
local_directory where it will spill data. Specifically, here’s where it is defined
You can control
local_directory via the
temporary-directory config option (a temporary directory is created if you don’t manually specify one). Note that workers don’t actually spill directly to
local_directory but instead spill to
local_directory/dask-worker-space/worker-<hash> to handle cases where multiple workers are running on the same filesystem.
You can also retrieve where each worker is spilling with the following snippet (where
client is my Dask client connected to a cluster):
In : client.run(lambda dask_worker: dask_worker.local_directory)