Pipeline_optimizer fails to train

I am trying to run TPOT using client from dask.
But I see it fails to fit, as shown in the following cell where the .fit() method refuses to proceed:


That is the current status.
Previously I received this error.
distributed.nanny - WARNING - Restarting worker
Would you have any suggestions? Thanks!

@yishairasowsky Welcome to Discourse and thanks for this question!

Have you already resolved this? If not, would you mind sharing a minimal, reproducible example, (because I’m not able to reproduce it directly)? More information about your setup will also allow us to help you better – dask, python, tpot versions, dtype+size of X_train and y_train, etc.!

You can use a toy dataset for the minimal example:

from dask_ml.datasets import make_classification

X_train, y_train = make_classification(chunks=100) # if X_train, y_train are Dask collections


# if X_train, y_train are NumPy ndarrays
X_train = X_train.compute()
y_train = y_train.compute()

distributed.nanny - WARNING - Restarting worker

This error suggests you’re running out of memory for your compute, and it can be caused by many things, but here are some common solutions: