I’ve found that I’m not able to invoke decorated or imported functions when submitting tasks to run on a KubeCluster.
Here are the files required to reproduce:
- Create a pod in your k8s cluster with
debug-shell.yaml
. - Copy
debug-scheduler.yaml
,debug-worker.yaml
,debug.py
, andoperation.py
onto the pod. - Run
python debug.py
on the pod.
You’ll see something like:
Traceback (most recent call last):
File "debug.py", line 29, in <module>
print(x.result())
File "/opt/conda/lib/python3.8/site-packages/distributed/client.py", line 277, in result
raise exc.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 73, in loads
return pickle.loads(x)
ModuleNotFoundError: No module named 'operation'
Alternatively, if you use the lru_cache
decorated version, you’ll see a stacktrace like this:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/distributed/client.py", line 277, in result
raise exc.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 73, in loads
return pickle.loads(x)
AttributeError: Can't get attribute 'expensive_operation' on <module '__mp_main__' from '/opt/conda/bin/dask-worker'>
I assume there is a way to be able to use multiple modules in Dask distributed tasks? And to use stdlib decorators? Feels like I’m missing something basic!