Where does Dask spill to disk?

Where does Dask spill data to on Linux? Can I control this?

I am attempting to concat two large arrays on a local machine and I get the following error:

OSError: [Errno 28] No space left on device

There is however plenty of space on the disk, except for a couple of small partitions like the tmpfs, but I can’t find anything in the trace or in the docs about where desk tries to spill to disk.

Good question @conifer. Each worker has its own local_directory where it will spill data. Specifically, here’s where it is defined

You can control local_directory via the temporary-directory config option (a temporary directory is created if you don’t manually specify one). Note that workers don’t actually spill directly to local_directory but instead spill to local_directory/dask-worker-space/worker-<hash> to handle cases where multiple workers are running on the same filesystem.

You can also retrieve where each worker is spilling with the following snippet (where client is my Dask client connected to a cluster):

In [9]: client.run(lambda dask_worker: dask_worker.local_directory)
Out[9]:
{'tcp://127.0.0.1:59134': '/var/folders/k_/lx1rdvqn253gd1wrcx__5frm0000gn/T/dask-worker-space/worker-utec8om9',
 'tcp://127.0.0.1:59135': '/var/folders/k_/lx1rdvqn253gd1wrcx__5frm0000gn/T/dask-worker-space/worker-xejs6vdu',
 'tcp://127.0.0.1:59140': '/var/folders/k_/lx1rdvqn253gd1wrcx__5frm0000gn/T/dask-worker-space/worker-tpmt7m3a',
 'tcp://127.0.0.1:59143': '/var/folders/k_/lx1rdvqn253gd1wrcx__5frm0000gn/T/dask-worker-space/worker-v64mb2uh'}

I went ahead and opened an issue for adding more documentation on this topic here Document where local files and spilled data are stored · Issue #7293 · dask/distributed · GitHub