When are Dask Actors truly useful?

Actors seem to provide a way to introduce “stateful” computations in a Dask workflow. Which as a concept is intriguing for me. However, I am struggling to figure out under which circumstances that might really be useful.

I have understood (correct me if I’ wrong) that it is possible to pass data as parameters, initialize a class on a worker node, and then, synchronously call a class method, that utilises the pre-initialized data, from the client.

Consider this pseudocode:

for worker in worker_nodes:
    actor = actors[worker]
    f = actor.worker_method(yieldparam(params[worker]))
    results = sync_results(results, f.result())

where “actors” is an array of objects, initialized for each worker IP address via something looking like this:

client.submit(ClassName, data, actor=True, workers=[]).result()

But let’s say we have 10 workers, and 1 client, going through and doing 1 remote call on each worker, then receiving, processing and syncing the results. This will be very inefficient (and not parallel), because the client would be talking to one worker at any given time.

Have I got the concept wrong? Is there a better “model” that utilises actors in a better way? In which cases are Actors truly useful?

Additionally, in the documentation there is a remark that Actor processes are single-threaded:

Currently workers have only a single thread for actors, but this may change in the future.

Is this still the case? Does it make a difference due to Python’s GIL limitations?

Actors are clearly not made for submitting tasks directly on Workers in a synchronous way. More to record some centralized state that can be updated by Client or other Workers, like with the Counter example.

It is true that there are not much example of real use cases using Actors around the web… I only found

thanks @guillaumeeb !

Any idea about this?

I didn’t check in the code, but I’m almost convinced that this is still true. I’m not sure I understand your question about Python’s GIL…