Dask_ml.model_selection gives error but still using cpu and gpu

Hi,
I am trying to use HyperbandSearchCV with dask_ml.model_selction. The code is shown below:

from dask_cuda import LocalCUDACluster

from dask_ml.model_selection import HyperbandSearchCV
from datetime import datetime

# Distribute grid-search across a cluster
from dask.distributed import Client

cluster = LocalCUDACluster()
client = Client(cluster)

dd_desc_df = dd.from_pandas(desc_df, npartitions=100)
dd_pchembl = dd.from_pandas(pchembl_array, npartitions=100)

X_train, X_test, y_train, y_test = train_test_split(dd_desc_df, dd_pchembl, test_size=0.2, random_state=42)

# Regressor 
lasso_reg = Lasso()

# Parameters
grid = { 
    'alpha': np.geomspace(1e-5,1e5,11),
    'max_iter': [1000, 5000, 10000],
    'tol': [1e-7, 1e-10],
    'selection' : ['random', 'cyclic'],
       }

# search 
grid_search = HyperbandSearchCV(estimator=lasso_reg, parameters=grid)

grid_search.fit(X_train, y_train)

After executing this code, it gives the following warning and error:


Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44871 instead
2023-02-03 15:06:17,523 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2023-02-03 15:06:17,523 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2023-02-03 15:06:17,703 - distributed.worker - WARNING - Mismatched versions found


    event = cls(**kwargs)
TypeError: __init__() missing 1 required positional argument: 'run_id'
2023-02-03 15:05:34,013 - distributed.nanny - ERROR - Worker process died unexpectedly

Even after this error, I can see Python using CPU and GPU. Is it expected behavior?

TIA,
Mandar Kulkarni

Hi @binary-alkemi,

Welcome to Dask Discourse.

It really depends on what caused the error, how many workers you have at first, and a lot of other concerns. There might be other workers still alive. Dask might also be trying to launch a new Worker and process the data. Or the Worker died not cleanly and some process is still doing something. You can maybe inspect all that using the Dask Dashboard.

But anyway, in my opinion it would be better to solve the error you got in the first place, don’t you agree?

First the warning, I’m not sure how you can have mismatched version providing your start the cluster from your main script?

Then the positional argument error, but it might be consequence of the Warning message. Do you have more details to add on your setup?

Hi @guillaumeeb ,
Thanks for your reply.

I am running this code on a laptop locally on Ubuntu 20.04 with GTX 1660 Ti card in the rapids-22.12 environment and not on HPC.

You are right that the old processes may not shut down as I try to run this code. I tried with client.shutdown() at the end of the code also. But it is not helpful.

The features are floats64 and int64 types and are scaled with Standardscaler. I am trying hyperparameter optimization, But I could not make it run with dask_ml and CuML. Something similar runs with scikit-learn commands.

Please let me know if you need some specific information.

Best,
Mandar Kulkarni

I think first we need to identify why you would have mismatched versions of some packages.

Could you run client.get_versions() on your setup so we can check which package is causing problems?

Next, could you build a Minimum Reproducible Example, so in your case, could we reproduce the issue with some randomly generated data?

Finally, do you have a more complete stacktrace of the exception you’re running into?