Dask-ml preprocessing raise AttributeError

I use Dask dataframe and dask-ml to manipulate my data. When I use dask-ml Min-max scaler, I get this error. Is there a way to prevent this error and make it work?

import dask.dataframe as dd
from dask_ml.preprocessing import MinMaxScaler

df = dd.read_csv('path to csv', parse_dates=['CREATED_AT']
                     , dtype={'ODI_UPDATED_AT': 'object'})
scaler = MinMaxScaler()

AttributeError: ‘Scalar’ object has no attribute ‘copy’

Thanks in Advance

Thanks for the question @blest. I don’t necessarily think I would expect MinMaxScaler.fit(...) to work on a Dask Series. At least when I tried the equivalent operation with sklearn.preprocessing.MinMaxScaler and a pandas Series, I got the following error:

ValueError: Expected 2D array, got 1D array instead:
array=[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I would suggest either passing in a Dask DataFrame instead of a Dask Series (this is more of a workaround) or converting your Dask Series to an appropriately shaped Dask Array. Here’s a small working example:

import pandas as pd
import dask.dataframe as dd
from dask_ml.preprocessing import MinMaxScaler

# Create data
df = pd.DataFrame({"A": range(10), "B": range(10, 20)})
ddf = dd.from_pandas(df, npartitions=2)

# Use MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(ddf["A"].to_dask_array(lengths=True).reshape(-1, 1))

Though I agree the error message you’re getting isn’t very informative. I’ve opened up this issue Better error message when using invalid `MinMaxScaler.fit(...)` inputs · Issue #951 · dask/dask-ml · GitHub for improving the error message users get in this failure case.

Thanks @jrbourbeau for your answer. I had solved the problem the same way but I wasn’t sure if this is the best way. And thank you for creating the issue.