I’m feeding dask with chucks of data to process (details here).
I’m using all logical cores as sub-processes also include read/write operations (mixed load).
I also specify load type in resources of the LocalCluster:
cluster: LocalCluster = LocalCluster(
processes=True,
n_workers=12, # all logical cores in 6c/12t CPU
resources={"io_bound": 3, "cpu_bound": 12}
...
)
As you see, I over-schedule here.
Here, io_bound processes start the other cpu_bound tasks, and after getting the data back (as_completed), it just logs and feeds more futures. So I was expecting that the OS (Win 11) does the context switching to execute the cpu_bound tasks on those otherwise idle (?) logical cores.
But I systematically get the following on task Manager, showing the first 4 logical cores not fully utilized (I’m sure there are tasks schedules through futures). But once in a while they also get full use. I fail to read Dask Daskboard in this respect, it is shoving 13 concurrent jobs, but it is too jumpy to see what’s happening.
How does this work? Does dask prevent context switching or is the Windows reporting wrongly (as it also says 100% CPU usage)?
Dask just launch and watch Python processes. The OS should do the context switching if needed. But if you have 3 workers dedicated to IO, they shouldn’t do much CPU use, do they?
Why do you mean by the two above sentences. What is too jumpy? And were do you see 100% CPU usage, on the Dashboard Worker pages?
As far as I understand, we can use overprovision, i.e. use more than existing (logical-)cores in n_workers or more tasks in resources than total (logical-)cores.
Of course, I don’t want to dedicate 1-3 cores to IO-bound processes. If the OS should do the context switching, it should do so and put the waiting CPU-bound processes into the first four logical cores.
But as you can see from the graphs, they are not fully utilized (the first four on the top line):
too jumpy to see
I meant, it shows a value at a specific one, as there is no time axis as the Windows task Manager shows.
the Windows reporting wrongly (as it also says 100% CPU usage)?
I meant this value:
It might be that Windows 11 is buggy in this respect (i.e. the first four graphs) of course, we have seen many in two years… Unfortunately, I don’t have any Win 10 boxes left to test
PS: In this particular run, I used a single dataset (i.e. 1 io-bound task), which generates 12 cpu-bound tasks (futures), whenever as_completed one, it adds one more task (through as_completed.add(client.submit(...)). They all use the same cluster/client.
Okay, I didn’t read deeply enough, you are trying to overprovision, but you are not really. You are starting 12 Workers, every worker being able to process 3 io_bound tasks and 12 cpu_bound ones, but I guess every Worker only has one thread, so only one task at a time.
goes in the additional kwarg part of the LocalCluster constructor, so it will be applied to each Worker.
You might want to look at SpecCluster to be able to specify different arguments to Workers.
I totally misunderstood the resources, I thought 3 cores can be used for io_bound and all 12 will be used for cpu_bound, e.g. resulting in:
core-0: 1 io, 1 cpu
core-1: 1 io, 1 cpu
core-2: 1 io, 1 cpu
core-3: 1 cpu
…
core-11: 1 cpu
So, in this case would the following work?
cluster: LocalCluster = LocalCluster(
processes=True,
n_workers=15, # Using a 6c/12t CPU with overprovisioning => 3 io + 12 cpu bound
resources={"io_bound": 1, "cpu_bound": 1} # per worker
threads_per_worker=1, # per worker
...
)
I will start 3 io_bound workers, which further start 12/3=4 cpu bound workers each.
Or if there is only a single file:
I will start 1 io_bound workers, which further start 12/1=12 cpu bound workers.
PS: I’m feeding the futures in a controlled manner, as they are not lazy. Feed an initial set, then when one finishes use as_completed.add() to schedule a new one.
Actually, I searched the documentation for SpecCluster and I was confused, therefore I missed that. I didn’t know dict type worker specification - in detail.
Earlier today, I asked ChatGPT and it provided me with some examples - it is so easy, now converting the code. This is THE solution, as you said, both this and the other thread will be solved by using it.