Hi, I’m very new to Dask, and only just testing out the distributed functionality. I’m finding that performance doesn’t appear to be scaling like it should, and I’m looking for help to sort it out.
I’ve seen this question but that seems more related to prioritisation and dependencies. I don’t think it helps me.
As an example of the problem, I have a CPU-bound test function,
math.factorial. If I submit it as a single
client.map, I get 6 seconds. But if I then submit a few jobs, they either run in sequence, or they look parallel but take longer, like one has a dependency on the other. Either way, the time is a linear multiple of the time it takes to execute one.
Here’s some example code. The sleeps are to spread out the tasks on the screenshot below:
import time import math import dask from dask.distributed import Client client = Client('tcp://myscheduler:8786') def quiet_factorial(n): return math.factorial(n) > 0 futs = client.map(quiet_factorial,  * 1, pure=False) print('Tasks: 1') %time [fut.result() for fut in futs] time.sleep(10) futs = client.map(quiet_factorial,  * 2, pure=False) print('\nTasks: 2') %time [fut.result() for fut in futs] time.sleep(10) futs = client.map(quiet_factorial,  * 4, pure=False) print('\nTasks: 4') %time [fut.result() for fut in futs] time.sleep(10) futs = client.map(quiet_factorial, range(500000, 500004), pure=False) print('\nTasks: 4 but different arguments') %time [fut.result() for fut in futs]
The results are below. They show a linear blowout in time, and no benefit from parallel processing:
Tasks: 1 CPU times: user 20.9 ms, sys: 11.2 ms, total: 32.2 ms Wall time: 6.68 s Tasks: 2 CPU times: user 44.2 ms, sys: 74 µs, total: 44.3 ms Wall time: 13.3 s Tasks: 4 CPU times: user 73.6 ms, sys: 7.74 ms, total: 81.3 ms Wall time: 26.7 s Tasks: 4 but different arguments CPU times: user 90.9 ms, sys: 7.06 ms, total: 98 ms Wall time: 26.7 s
I’ve attached a screenshot of the status graph. It shows a weird blend of tasks happening in seqeuence and parallel, but in any case, the time is proportional to the tasks, with no benefit from parallelisation.
Thanks for any help, I’m sure I’m doing something basic and dumb.