Question about nthreads, workers, and chunk

yjhmitweb · November 4, 2022, 4:04pm

Hi, I’m wondering in Dask, what is the purpose of nthreads? As far as I know, each chunk is distributed to a worker for computation, both in cases of CPU and GPU.
Considering this case:

import dask.array.linalg as dal
rs = da.random.RandomState(RandomState=cupy.random.RandomState if device == "gpu" else np.random.RandomState)
a = rs.random(size=(1000000, 1000), chunks=(10000, 1000)).persist()
wait(a)
wait(client.persist(dal.svd(a)))

When using UCX over InfiniBand, running on 2 nodes, each with 1 V100, setting --nthreads 1 for dask-worker, it takes about 20s to finish; whereas setting --nthreads 2, it takes about 12s to finish.
I’m curious when enabling GPU for computation, what is the point of nthreads since the data is initialized on GPU and I’m not moving the final result back to CPU? Why is there a difference in the above case when changing nthreads?

wence · November 4, 2022, 4:17pm

What is happening here is that when you say --nthreads 2, each worker is launched with two threads that can run computations and submit kernels to the GPU. As long as the resulting memory usage is not too great, this can work fine (in the same way that running multiple threads on the same CPU work is also fine).

This will not always be faster, and may sometimes be slower, or lead to more out-of-memory errors that using just a single thread and one GPU per worker.

Topic		Replies	Views
--nthreads does not control dask-worker's behavior Distributed	1	553	November 13, 2022
Tuning Distributed Dask Clusters with GPUs Distributed dask-gateway , distributed	3	1007	February 21, 2022
Performance guardrails? Distributed	1	167	April 23, 2023
How to configure Dask cluster based on my workload? Distributed	3	659	July 12, 2022
Set up local cluster with custom resource assignments Distributed distributed	3	318	April 8, 2022

Question about nthreads, workers, and chunk

Related topics