Actors seem to provide a way to introduce “stateful” computations in a Dask workflow. Which as a concept is intriguing for me. However, I am struggling to figure out under which circumstances that might really be useful.
I have understood (correct me if I’ wrong) that it is possible to pass data as parameters, initialize a class on a worker node, and then, synchronously call a class method, that utilises the pre-initialized data, from the client.
Consider this pseudocode:
for worker in worker_nodes: actor = actors[worker] f = actor.worker_method(yieldparam(params[worker])) results = sync_results(results, f.result())
where “actors” is an array of objects, initialized for each worker IP address via something looking like this:
client.submit(ClassName, data, actor=True, workers=[192.168.1.18]).result()
But let’s say we have 10 workers, and 1 client, going through and doing 1 remote call on each worker, then receiving, processing and syncing the results. This will be very inefficient (and not parallel), because the client would be talking to one worker at any given time.
Have I got the concept wrong? Is there a better “model” that utilises actors in a better way? In which cases are Actors truly useful?
Additionally, in the documentation there is a remark that Actor processes are single-threaded:
Currently workers have only a single thread for actors, but this may change in the future.
Is this still the case? Does it make a difference due to Python’s GIL limitations?