Why are my tasks not executing in parallel?

Hi, I’m very new to Dask, and only just testing out the distributed functionality. I’m finding that performance doesn’t appear to be scaling like it should, and I’m looking for help to sort it out.

I’ve seen this question but that seems more related to prioritisation and dependencies. I don’t think it helps me.

As an example of the problem, I have a CPU-bound test function, math.factorial. If I submit it as a single client.submit or client.map, I get 6 seconds. But if I then submit a few jobs, they either run in sequence, or they look parallel but take longer, like one has a dependency on the other. Either way, the time is a linear multiple of the time it takes to execute one.

Here’s some example code. The sleeps are to spread out the tasks on the screenshot below:

import time
import math

import dask
from dask.distributed import Client

client = Client('tcp://myscheduler:8786')

def quiet_factorial(n):
    return math.factorial(n) > 0

futs = client.map(quiet_factorial, [500000] * 1, pure=False)
print('Tasks: 1')
%time [fut.result() for fut in futs]

futs = client.map(quiet_factorial, [500000] * 2, pure=False)
print('\nTasks: 2')
%time [fut.result() for fut in futs]

futs = client.map(quiet_factorial, [500000] * 4, pure=False)
print('\nTasks: 4')
%time [fut.result() for fut in futs]

futs = client.map(quiet_factorial, range(500000, 500004), pure=False)
print('\nTasks: 4 but different arguments')
%time [fut.result() for fut in futs]

The results are below. They show a linear blowout in time, and no benefit from parallel processing:

Tasks: 1
CPU times: user 20.9 ms, sys: 11.2 ms, total: 32.2 ms
Wall time: 6.68 s

Tasks: 2
CPU times: user 44.2 ms, sys: 74 µs, total: 44.3 ms
Wall time: 13.3 s

Tasks: 4
CPU times: user 73.6 ms, sys: 7.74 ms, total: 81.3 ms
Wall time: 26.7 s

Tasks: 4 but different arguments
CPU times: user 90.9 ms, sys: 7.06 ms, total: 98 ms
Wall time: 26.7 s

I’ve attached a screenshot of the status graph. It shows a weird blend of tasks happening in seqeuence and parallel, but in any case, the time is proportional to the tasks, with no benefit from parallelisation.

Thanks for any help, I’m sure I’m doing something basic and dumb.

Two other noteworthy things:

  • I have restarted the scheduler and workers a few times
  • there is a mismatch between the python versions:
/home/user/src/env/lib/python3.9/site-packages/distributed/client.py:1265: VersionMismatchWarning: Mismatched versions found

| Package | client         | scheduler      | workers        |
| python  | 3.9.10.final.0 | 3.8.10.final.0 | 3.8.10.final.0 |

@tonylewis Welcome!

I can’t reproduce this, all tasks run in ~3s for me. Maybe it’s something specific to your cluster setup? I’d encourage you to verify if your client, scheduler, workers are all connected properly.

[fut.result() for fut in futs]

Sidenote, you can also consider replacing this with wait() or as_completed.

Thanks for the reply. I tried lost of lots of different things, different computers, different architectures, fresh installs. The solution was that I needed to specify --nprocs to match the number of cores/threads that the system had available. All is humming along nicely now.

1 Like