Capture logging or stdout/stderr content from workers in GCPCluster

Apologies if this has been answered elsewhere, but the best I can find is Capturing and logging stdout/stderr on workers · Issue #2033 · dask/distributed · GitHub. It mentions many deployments but not GCPCluster or the other options in cloud_provider.

I am a little confused because I don’t see my print statements in the output from the docker container running on the worker VM. I do see the messages from the nanny worker. This makes me think that stdout is already being captured, but I don’t know where it is going. For the moment, I have set up my deployment to attach the handler from google.cloud.logging to the root Python logger, and GCP’s OS config sends journalctl to Google Cloud Logs, which gets any remaining output from the container there. But I don’t know where the print statements are disappearing.

Hi @siddharthab, welcome to Dask community!

I’ve definitely already seen this problem of print statements that cannot be found on other Deployment strategy. I decided to give it a try with a simple local CLI deployment, starting a Dask scheduler and a Dask worker connected to it in two different terminals.

Then, I launched the following code:

from distributed import Client
client = Client("tcp://w.x.y.z:8786")

def my_print_func():
    print('hello Dask')

client.submit(my_print_func)

And I did find the print statement in my only worker stdout/stderr output:

...several lines of starting worker logs...
2023-05-30 17:07:31,182 - distributed.worker - INFO - -------------------------------------------------
2023-05-30 17:07:31,183 - distributed.core - INFO - Starting established connection to tcp://192.168.205.10:8786
hello Dask

cc @jacobtomlinson, do you see any reason stderr/stdout outputs would not show on a GCPCluster deployment?