Detect if `Client` spawned by the user is using threads or processes

Hi,

TL;DR: I want to know how to detect whether a dask Client was created with processes=False or not, i.e. whether the Dask client is going to use thread scheduling or process scheduling. Keep reading for more context.

I have developed a library where users are able to pass a dask Client as an executor, and I create a set of tasks that I will eventually give to the Client, without user intervention. A pseudo-code example of what the user would be writing is

from dask.distributed import Client

cl = Client(...)

graph = GraphBuilder(cl)
graph.add(op)
graph.add(op)


graph.execute()

In this example the GraphBuilder object is an API that exposes certain functions that user interacts with, but they don’t touch the dask Client anymore once it’s been passed to the GraphBuilder object.

The applications built using this API also feature the following

  • They make heavy use of C and C++ code under the hood (through the API)
  • The computations are always embarassingly parallel

Taking that into account, Python threads don’t really play well with this API. Thus, I would like to have a programmatic way to warn/forbid users from creating a dask Client that uses threads, e.g. anything like Client(processes=False).

What is the best way to query a dask Client to ask whether it’s going to schedule computations with threads or processes?

Thank you!

Hi @vpadulan,

There are simple APIs to get information about cluster/scheduler:

from distributed import Client

client = Client(processes=False)
client.cluster.processes

returns False.