Basic question about parallelizing *different* model fits with dask(-ml)

Hi Dask forum,
I can’t find a clear best practices on what feels like a basic question: How can dask or dask-ml be used to parallelize model fitting over different model frames? Most guidance appears oriented towards parameter tuning and meta-estimators.

Somewhat related questions:

As a minimal example, let’s say I spin up a client on a SLURM-managed HPC and want to fit a million different linear models using eg sklearn.linear_model.LinearRegression (or an sklearn model with an n_jobs argument that uses joblib to parallelize, or another module that similarly relies on BLAS and LAPACK for linalg, or etc.):

import dask
from dask_jobqueue import SLURMCluster
from dask.distributed import Client

import numpy as np
from sklearn.linear_model import LinearRegression

cluster = SLURMCluster(<args>)
cluster.scale(4)
client = Client(cluster)
client.wait_for_workers(4)

rng = np.random.default_rng()
N, P = 100, 10

designs = [rng.normal(size=[N, P]) for i in range(int(1E6))]
responses = [rng.normal(size=P) for i in range(int(1E6))]

# naive implementation
for X,y in zip(designs, responses):
   reg = dask.delayed(LinearRegression().fit(X, y))

res = dask.compute(*reg)

What’s the best way to distribute these fits on a cluster?