I would like to write directly to Kafka from the Dask workers. I am currently experimenting and have found one solution which creates one producer per task. I am wondering if it is possible to create one producer per worker instead?
All I found was Write to Kafka from a Dask worker - Stack Overflow, but get_worker() seems not to work anymore (it throws a “no worker found” error).
Is it even good practice to use just one producer? Or is it better to use one producer per task?
I think the answer from Stackoverflow you found still stands:
You might also consider re-creating the producer in every task, then writing data, and then closing that producer. If it doesn’t take too long to create a producer relative to the amount of time it takes to write a partition of data then this might be a decent solution. It’s slightly less efficient, but probably more robust/secure/mature.
get_worker should work anyway, how did you try to use it?
You could also register a WorkerPlugin that would set a producer on startup and close it on teardown.