Customize Autoscaling using dask's kubernetes operator

Hi,
we are currently trying to migrate from a static dask deployment with n workers to an adaptive deployment. For this we are trying to use the new dask kubernetes operator.

When I tried to run some example workload with 1-6 workers it seems to be significantly slower (up to 5x) than using min=3=max=3 workers. It appears that the operator is always trying to scale workers up and down. Now I had another look in the documentation and discovered the Adapative class ( Adaptive deployments — Dask documentation ) which has constructor parameters like target_duration and wait_count. I was wondering if it is possible configure these when using the dask kubernetes operator or if they are only available through the python API.

Hi @sil-lnagel, welcome to Dask Discourse forum,

Based on the code source, I don’t think these specific adaptive features are available in dask-kubernetes.

But maybe @jacobtomlinson would prove me wrong?

The adaptive scaling in dask-kubernetes is handles by the controller, rather than client side via th Adaptive class.

If you’re seeing poor autoscaling behaviour could I ask you to open an issue on the GitHub repo with a code example that demonstrates the problem so I can look into it?

Thanks a lot for your explanation @jacobtomlinson . Regarding the example :it is a bit tricky to create one but I am working on it. If I manage to create one I will add it to my other question Shuffle P2P unstable with adaptive k8s operator?. This ticket also contains a better description of the likely “root cause” of our problems.

1 Like