I use Dask dataframe and dask-ml to manipulate my data. When I use dask-ml Min-max scaler, I get this error. Is there a way to prevent this error and make it work?
import dask.dataframe as dd
from dask_ml.preprocessing import MinMaxScaler
df = dd.read_csv('path to csv', parse_dates=['CREATED_AT']
, dtype={'ODI_UPDATED_AT': 'object'})
scaler = MinMaxScaler()
print(scaler.fit_transform(df['M']))
AttributeError: ‘Scalar’ object has no attribute ‘copy’
Thanks for the question @blest. I don’t necessarily think I would expect MinMaxScaler.fit(...) to work on a Dask Series. At least when I tried the equivalent operation with sklearn.preprocessing.MinMaxScaler and a pandas Series, I got the following error:
ValueError: Expected 2D array, got 1D array instead:
array=[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I would suggest either passing in a Dask DataFrame instead of a Dask Series (this is more of a workaround) or converting your Dask Series to an appropriately shaped Dask Array. Here’s a small working example:
import pandas as pd
import dask.dataframe as dd
from dask_ml.preprocessing import MinMaxScaler
# Create data
df = pd.DataFrame({"A": range(10), "B": range(10, 20)})
ddf = dd.from_pandas(df, npartitions=2)
# Use MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(ddf)
scaler.fit(ddf["A"].to_dask_array(lengths=True).reshape(-1, 1))
Thanks @jrbourbeau for your answer. I had solved the problem the same way but I wasn’t sure if this is the best way. And thank you for creating the issue.