Hi,
we are currently trying to migrate from a static dask deployment with n workers to an adaptive deployment. For this we are trying to use the new dask kubernetes operator.
When I tried to run some example workload with 1-6 workers it seems to be significantly slower (up to 5x) than using min=3=max=3 workers. It appears that the operator is always trying to scale workers up and down. Now I had another look in the documentation and discovered the Adapative
class ( Adaptive deployments — Dask documentation ) which has constructor parameters like target_duration
and wait_count
. I was wondering if it is possible configure these when using the dask kubernetes operator or if they are only available through the python API.
Hi @sil-lnagel, welcome to Dask Discourse forum,
Based on the code source, I don’t think these specific adaptive features are available in dask-kubernetes.
But maybe @jacobtomlinson would prove me wrong?
The adaptive scaling in dask-kubernetes
is handles by the controller, rather than client side via th Adaptive
class.
If you’re seeing poor autoscaling behaviour could I ask you to open an issue on the GitHub repo with a code example that demonstrates the problem so I can look into it?
Thanks a lot for your explanation @jacobtomlinson . Regarding the example :it is a bit tricky to create one but I am working on it. If I manage to create one I will add it to my other question Shuffle P2P unstable with adaptive k8s operator?. This ticket also contains a better description of the likely “root cause” of our problems.
1 Like