Hi, I’m running dask on distributed CPU systems, each node has 28 CPU cores. I have 4 nodes,
node #1:
dask-scheduler --interface ib0 --protocol ucx
node #2:
python test.py $scheduler_ip
node #3 and #4:
dask-worker $scheduler_ip --no-nanny --interface ib0 --nthreads 1
In test.py,
import dask.array as da
import numpy as np
from dask.distributed import Client
client = Client(address=$scheduler_ip)
rs = da.random.RandomState(RandomState=np.random.RandomState)
x = rs.random(size=(2**15, 2**15), chunks='auto').persist()
y = rs.random(size=(2**15, 2**15), chunks='auto').persist()
wait(x)
wait(y)
for i in range(10):
wait(client.persist(x.dot(y)))
Since I set the worker --nthreads 1, it should only use 1 thread on node#3 and node#4 respectively.
Printing client also proves this:
<Client: 'ucx://192.168.64.185:8786' processes=2 threads=2, memory=6.68 GiB>
But using top
to inspect the CPU utilization on node#3 and node#4,
The CPU usage even fluctuates to the most 5600% sometimes. I’m wondering why it behaves like this, it seems that it still uses all 28 cores with 28 threads on the node.