Dask with Azure ML Studio: How can I scale Distributed horizontally in a Notebook?

Using Azure ML Studio, you can fire up JupyterLab from a compute instance, and you can start an ML cluster. But does anyone have a good way to work with a Distributed cluster scaling horizontally in this environment? I see a few different approaches people have tried to run Dask on Azure ML:

  • Run Dask on Ray, but this isn’t using distributed.
  • Use dask-mpi, but this is a pretty complicated setup and involves port forwarding

Ideally I’d just use Coiled or Nebari, but both have challenges in the secure Azure environment I’m working with (NATO) and it would be helpful to have a solution that was already supported with the Azure ML Studio environment.

Would creating another backend for dask jobqueue or dask gateway be a solution?

Does anyone in this community have suggestions for best approach?

Hmm I haven’t had a chance to use GitHub - microsoft/ray-on-aml: Turning AML compute into Ray cluster, but IIUC that’s using the same approach you linked to, bootstraping Ray workers on AML compute and then using dask-on-ray (which I only now realized doesn’t use dask.distributed) on those workers.

I wonder if we could use something similar to dask-databricks here? I don’t know if I’ll have a chance to look into this soon, but I’ll post here if I do.

1 Like

@TomAugspurger , ooh something like dask-databricks but for AzureML clusters sounds ideal! Fingers crossed!

Took a quick look on ray-on-aml package, I guess it could be transformed to launch Dask Distributec clusters instead of Ray! But that would need a bit of work though.

It turns out Microsoft is interested in funding some development work that would allow users to easily spin up Dask clusters on Azure ML clusters. If interested, please ping me: rich@opensciencecomputing.com.

1 Like