Hi, I’m wondering in Dask, what is the purpose of nthreads? As far as I know, each chunk is distributed to a worker for computation, both in cases of CPU and GPU.
Considering this case:
import dask.array.linalg as dal
rs = da.random.RandomState(RandomState=cupy.random.RandomState if device == "gpu" else np.random.RandomState)
a = rs.random(size=(1000000, 1000), chunks=(10000, 1000)).persist()
wait(a)
wait(client.persist(dal.svd(a)))
When using UCX over InfiniBand, running on 2 nodes, each with 1 V100, setting --nthreads 1
for dask-worker, it takes about 20s to finish; whereas setting --nthreads 2
, it takes about 12s to finish.
I’m curious when enabling GPU for computation, what is the point of nthreads since the data is initialized on GPU and I’m not moving the final result back to CPU? Why is there a difference in the above case when changing nthreads?