Basic question about parallelizing different model fits with dask(-ml)

t-silvers · November 17, 2022, 8:40pm

Hi Dask forum,
I can’t find a clear best practices on what feels like a basic question: How can dask or dask-ml be used to parallelize model fitting over different model frames? Most guidance appears oriented towards parameter tuning and meta-estimators.

Somewhat related questions:

As a minimal example, let’s say I spin up a client on a SLURM-managed HPC and want to fit a million different linear models using eg sklearn.linear_model.LinearRegression (or an sklearn model with an n_jobs argument that uses joblib to parallelize, or another module that similarly relies on BLAS and LAPACK for linalg, or etc.):

import dask
from dask_jobqueue import SLURMCluster
from dask.distributed import Client

import numpy as np
from sklearn.linear_model import LinearRegression

cluster = SLURMCluster(<args>)
cluster.scale(4)
client = Client(cluster)
client.wait_for_workers(4)

rng = np.random.default_rng()
N, P = 100, 10

designs = [rng.normal(size=[N, P]) for i in range(int(1E6))]
responses = [rng.normal(size=P) for i in range(int(1E6))]

# naive implementation
for X,y in zip(designs, responses):
   reg = dask.delayed(LinearRegression().fit(X, y))

res = dask.compute(*reg)

What’s the best way to distribute these fits on a cluster?

Topic		Replies	Views
Clarification on Distributed Dask ML (Is ML really Distributed?) Distributed dask-ml	1	236	August 23, 2022
How to parallelize a loop that applies the same model to different data	2	39	September 5, 2024
How parallelism works in dask-ml and what are its performance gain blocks Distributed	3	404	June 27, 2023
Perform the same operation on all columns of a dask dataframe in parallel Dask DataFrame delayed , distributed , dask-ml	5	216	November 10, 2022
Mlflow integration with Dask-SQL machine learning Distributed	6	194	August 27, 2023

Basic question about parallelizing *different* model fits with dask(-ml)

Related topics

Basic question about parallelizing different model fits with dask(-ml)