EventLoops and KubeCluster on Windows

Hello!

I am trying to understand an error when starting a KubeCluster from a windows machine. The following should reproduce the error (on Windows):

conda create -n dask_temp_env dask-kubernetes -c conda-forge
conda activate dask_temp_env
python

and then

from dask_kubernetes.operator import KubeCluster
cluster = KubeCluster(name='foo', namespace='bar') # RuntimeError

The first line in the traceback is

ERROR:root:exec: _WindowsSelectorEventLoop does NOT support subprocesses, see README.md

followed by a block concluding in an kubernetes_asyncio.client.exceptions.ApiException:

HTTP response body: {“kind”:“Status”,“apiVersion”:“v1”,“metadata”:{},“status”:“Failure”,“message”:“daskclusters.kubernetes.dask.org is forbidden: User "system:anonymous" cannot create resource "daskclusters" in API group "kubernetes.dask.org" in the namespace "[MY_NAMESPACE]"”,“reason”:“Forbidden”,“details”:{“group”:“kubernetes.dask.org”,“kind”:“daskclusters”},“code”:403}

So this (in particular the "User 'system:anonymous' ") indicates an issue with the authentication.

Looking at the kubernetes_asyncio repo, there is a very clear warning regarding SelectorEventLoop on Windows. Unfortunately the work-around indicated there does not appear to work for users importing dask-kubernetes.

It looks like we have an older version (v2022.10.1) of the dask-kubernetes-operator deployed.

Is this a known bug/feature? Does anyone know if updating our deployment of the operator should be expected to fix this issue?

Thanks!
-Ellery

Hi @elleryames,

What happens if you try the workaround? Same error message?

Well, I really don’t know, but it is always better to be up to date!

cc @jacobtomlinson who might have some ideas.

This is not a known bug, but I’m not especially surprised as we don’t officially support Windows in dask-kubernetes (but we could if folks want to contribute fixes).

We are planning on migrating away from kubernetes_asyncio in the near future so this problem may just get worked around. But could you still open a bug report on GitHub?

2 Likes

Thanks for the replies @guillaumeeb and @jacobtomlinson !

To follow up on a few of the points raised:

  • I have now deployed v2023.3.2 of the dask-kubernetes-operator, and sadly no change to the error.
  • Setting the even loop policy as suggested in the kubernetes_asyncio README also does not change the error.
  • A bug report has been raised; and can be found here.

Ultimately this is not a blocker for our team since we can work through a Docker container, but I thought the question/issue would be of interest to the community. Hopefully it is resolved by the kubernetes_asyncio successor!

1 Like