When are Dask Actors truly useful?

jurgencuschieri · October 11, 2023, 11:43pm

Actors seem to provide a way to introduce “stateful” computations in a Dask workflow. Which as a concept is intriguing for me. However, I am struggling to figure out under which circumstances that might really be useful.

I have understood (correct me if I’ wrong) that it is possible to pass data as parameters, initialize a class on a worker node, and then, synchronously call a class method, that utilises the pre-initialized data, from the client.

Consider this pseudocode:

for worker in worker_nodes:
    actor = actors[worker]
    f = actor.worker_method(yieldparam(params[worker]))
    results = sync_results(results, f.result())

where “actors” is an array of objects, initialized for each worker IP address via something looking like this:

client.submit(ClassName, data, actor=True, workers=[192.168.1.18]).result()

But let’s say we have 10 workers, and 1 client, going through and doing 1 remote call on each worker, then receiving, processing and syncing the results. This will be very inefficient (and not parallel), because the client would be talking to one worker at any given time.

Have I got the concept wrong? Is there a better “model” that utilises actors in a better way? In which cases are Actors truly useful?

Additionally, in the documentation there is a remark that Actor processes are single-threaded:

Currently workers have only a single thread for actors, but this may change in the future.

Is this still the case? Does it make a difference due to Python’s GIL limitations?

guillaumeeb · October 14, 2023, 7:54am

Actors are clearly not made for submitting tasks directly on Workers in a synchronous way. More to record some centralized state that can be updated by Client or other Workers, like with the Counter example.

It is true that there are not much example of real use cases using Actors around the web… I only found

jurgencuschieri · October 14, 2023, 9:56am

thanks @guillaumeeb !

Any idea about this?

guillaumeeb · October 18, 2023, 7:29pm

I didn’t check in the code, but I’m almost convinced that this is still true. I’m not sure I understand your question about Python’s GIL…

Topic		Replies	Views
Optimising Dask computations (memory implications and communication overhead) Distributed delayed , future , distributed	6	300	October 12, 2023
Using Dask as a DAG framework (no parallelization)	1	410	March 12, 2023
Only 1 worker is running when the DAG is forking Distributed	1	159	September 11, 2023
Can a worker perform computation and io in parallel? Distributed distributed	1	142	August 18, 2023
Dask drops while sending data between actors with no error logs Distributed	3	127	February 9, 2024

When are Dask Actors truly useful?

Related topics