Dask_ml.model_selection gives error but still using cpu and gpu

binary-alkemi · February 3, 2023, 7:09am

Hi,
I am trying to use HyperbandSearchCV with dask_ml.model_selction. The code is shown below:

from dask_cuda import LocalCUDACluster

from dask_ml.model_selection import HyperbandSearchCV
from datetime import datetime

# Distribute grid-search across a cluster
from dask.distributed import Client

cluster = LocalCUDACluster()
client = Client(cluster)

dd_desc_df = dd.from_pandas(desc_df, npartitions=100)
dd_pchembl = dd.from_pandas(pchembl_array, npartitions=100)

X_train, X_test, y_train, y_test = train_test_split(dd_desc_df, dd_pchembl, test_size=0.2, random_state=42)

# Regressor 
lasso_reg = Lasso()

# Parameters
grid = { 
    'alpha': np.geomspace(1e-5,1e5,11),
    'max_iter': [1000, 5000, 10000],
    'tol': [1e-7, 1e-10],
    'selection' : ['random', 'cyclic'],
       }

# search 
grid_search = HyperbandSearchCV(estimator=lasso_reg, parameters=grid)

grid_search.fit(X_train, y_train)

After executing this code, it gives the following warning and error:


Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44871 instead
2023-02-03 15:06:17,523 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2023-02-03 15:06:17,523 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2023-02-03 15:06:17,703 - distributed.worker - WARNING - Mismatched versions found


    event = cls(**kwargs)
TypeError: __init__() missing 1 required positional argument: 'run_id'
2023-02-03 15:05:34,013 - distributed.nanny - ERROR - Worker process died unexpectedly

Even after this error, I can see Python using CPU and GPU. Is it expected behavior?

TIA,
Mandar Kulkarni

guillaumeeb · February 3, 2023, 8:32am

Hi @binary-alkemi,

Welcome to Dask Discourse.

It really depends on what caused the error, how many workers you have at first, and a lot of other concerns. There might be other workers still alive. Dask might also be trying to launch a new Worker and process the data. Or the Worker died not cleanly and some process is still doing something. You can maybe inspect all that using the Dask Dashboard.

But anyway, in my opinion it would be better to solve the error you got in the first place, don’t you agree?

First the warning, I’m not sure how you can have mismatched version providing your start the cluster from your main script?

Then the positional argument error, but it might be consequence of the Warning message. Do you have more details to add on your setup?

binary-alkemi · February 3, 2023, 9:46am

Hi @guillaumeeb ,
Thanks for your reply.

I am running this code on a laptop locally on Ubuntu 20.04 with GTX 1660 Ti card in the rapids-22.12 environment and not on HPC.

You are right that the old processes may not shut down as I try to run this code. I tried with client.shutdown() at the end of the code also. But it is not helpful.

The features are floats64 and int64 types and are scaled with Standardscaler. I am trying hyperparameter optimization, But I could not make it run with dask_ml and CuML. Something similar runs with scikit-learn commands.

Please let me know if you need some specific information.

Best,
Mandar Kulkarni

guillaumeeb · February 3, 2023, 11:30am

I think first we need to identify why you would have mismatched versions of some packages.

Could you run client.get_versions() on your setup so we can check which package is causing problems?

Next, could you build a Minimum Reproducible Example, so in your case, could we reproduce the issue with some randomly generated data?

Finally, do you have a more complete stacktrace of the exception you’re running into?

Topic		Replies	Views
Serialization problem using GridSearchCV fit dask-ml	5	73	August 20, 2024
Does dask_ml.model_selection.GridSearchCV support GPU by applying LocalCUDACluster()? dask-ml , gpu	11	1510	January 5, 2022
Using GridsearchCV on Multi-GPU with RF Dask DataFrame distributed , gpu	4	931	April 19, 2023
First time user: Coiled + RandomizedSearchCV hanging indefinitely, any way to get progress?	2	362	July 10, 2023
Dask deployment on SLURM Cluster with GPUs Deploying Dask dask-mpi , distributed	7	357	May 20, 2024

Dask_ml.model_selection gives error but still using cpu and gpu

Related topics