Accessing TaskState metadata for running tasks

Hi, we’re using Dask distributed to execute some custom, long-running computations on a cluster. The workers spawn a separate process to perform the computation and we want to record the PID of that process, along with the hostname. This is needed for us to perform some troubleshooting if the task is stuck. We see that TaskState has a metadata field (Add TaskState metadata by jrbourbeau · Pull Request #4191 · dask/distributed · GitHub) and would like to leverage that. The issue is that TaskState metadata is only synced to the scheduler after the task finishes processing. Is there a way to sync the task state to the Dask scheduler (and then we can access via Client.run_on_scheduler) while the task is running? Thanks.

1 Like

@dcheng Welcome to Discourse!

Could you please share some more details about use-case, and if possible, a minimal example? How/Why are you accessing the TaskState? It’ll allow us to help you better. :smile:

I’d also suggest looking into Scheduler/Worker plugins, which might be better suited for logging PID: Scheduler Plugins — Dask.distributed 2022.5.0+13.gc10476ff documentation