--nthreads does not control dask-worker's behavior

Hi, I’m running dask on distributed CPU systems, each node has 28 CPU cores. I have 4 nodes,
node #1:
dask-scheduler --interface ib0 --protocol ucx

node #2:
python test.py $scheduler_ip

node #3 and #4:
dask-worker $scheduler_ip --no-nanny --interface ib0 --nthreads 1

In test.py,

import dask.array as da
import numpy as np
from dask.distributed import Client

client = Client(address=$scheduler_ip)
rs = da.random.RandomState(RandomState=np.random.RandomState)
x = rs.random(size=(2**15, 2**15), chunks='auto').persist()
y = rs.random(size=(2**15, 2**15), chunks='auto').persist()
wait(x)
wait(y)

for i in range(10):
    wait(client.persist(x.dot(y)))

Since I set the worker --nthreads 1, it should only use 1 thread on node#3 and node#4 respectively.
Printing client also proves this:
<Client: 'ucx://192.168.64.185:8786' processes=2 threads=2, memory=6.68 GiB>

But using top to inspect the CPU utilization on node#3 and node#4,
image

The CPU usage even fluctuates to the most 5600% sometimes. I’m wondering why it behaves like this, it seems that it still uses all 28 cores with 28 threads on the node.

Hi @yjhmitweb,

I would hazard that nthread only limit the number of task a worker can run concurrently. But if one of this task is multi-threaded, Dask won’t be aware of it, and will do nothing to prevent it from using more thant one core. So I’d guess that here, the Numpy you use must take advantage of all the cores it has to perform the operations on the dask-array chunks. If you don’t want to use more than 100% cpu, you should look at the possibility of limiting Numpy. This is sometimes doable by specifying OMP_NUM_THREADS=1 or some other configuration or environment variable, depending on which environment you execute this code.