I’m not sure if I’m the best person for a complete answer, ccing @TomAugspurger to complete.
dask-ml depends on sklearn, it uses a lot of sklearn algorithms, and it build upon it to propose distributed algorithm optimized for Dask. dask-ml algorithms require Dask, sickit-learn needs to be able to work without Dask.
Dask-ml doesn’t try to parallelize low level code like you would do with OpenMP directives. It either parallelize a single model learning by chunking the data and using partial fitting on each chunk of data, either parallelize multiple model learning for performing a GridSearch for example. Some other algorithm that can well be parallelized like RandomForest (where you can fit multiple tree models at the same time) also benefit from dask-ml.
dask-ml is well suited if you have a lot of data and a model that can be learn by chunks, or if you need to learn in parallel several models that fit in memory. See Dask-ML — dask-ml 2022.5.28 documentation.
For K-means, Dask-ML implements the k-means|| initialization strategy. My (possibly wrong understanding) is that it’s going to be worse than k-means++ for data that fit in memory (like scikit-learn typically focuses on). I think the linear models / minimizers like admm can take a while to converge, but I haven’t looked into that deeply. Other algorithms implemented first in Dask-ML, like Hyperband, have I think since been implemented in scikit-learn.
@guillaumeeb’s answer covers the rest. Dask-ML isn’t really useful for single-node parallelism, and the performance gains will typically come when you have a task that’s compute- or memory-bound that can be distributed on a cluster (so not too much communication).