When running computation heavy tasks there is a continuous printing of INFO_GarbageCollection logs and warnings from workers. This is a barrier for other console logs used for either debugging or task status. Examples below,
dask_worker_2 | distributed.utils_perf - INFO - full garbage collection released 34.77 MiB from 337 reference cycles (threshold: 9.54 MiB)
dask_worker_2 | distributed.utils_perf - WARNING - full garbage collections took 23% CPU time recently (threshold: 10%)
Here is what I tried (without success):
-
disable logging using;
import logging from distributed.worker import logger logging.disable(logging.WARNING) logger.warning('ignore')
-
Updating distributed.yaml to increase the time to probe for warnings;
admin: tick: interval: 60s #20ms # time between event loop health checks limit: 300s #3s # time allowed before triggering a warning
distributed: version: 2 # logging: # distributed: info # distributed.client: warning
-
Read through logging source code, wondering if distributed.worker file should be updated.
It would be useful to write console output to a file. apart from disabling the warnings. I tried setting this up from .py file as per reference
#Write logs to disk
logging_config = {
"version": 1,
"handlers": {
"file": {
"class": "logging.handlers.RotatingFileHandler",
"filename": "consoleLogs.log",
"level": "INFO",
},
"console": {
"class": "logging.StreamHandler",
"level": "INFO",
}
},
"loggers": {
"distributed.worker": {
"level": "INFO",
"handlers": ["file", "console"],
},
"distributed.scheduler": {
"level": "INFO",
"handlers": ["file", "console"],
}
}
}
dask.config.config['logging'] = logging_config