Dask-ml preprocessing raise AttributeError

blest · November 7, 2022, 1:13pm

I use Dask dataframe and dask-ml to manipulate my data. When I use dask-ml Min-max scaler, I get this error. Is there a way to prevent this error and make it work?

import dask.dataframe as dd
from dask_ml.preprocessing import MinMaxScaler

df = dd.read_csv('path to csv', parse_dates=['CREATED_AT']
                     , dtype={'ODI_UPDATED_AT': 'object'})
scaler = MinMaxScaler()
print(scaler.fit_transform(df['M']))

AttributeError: ‘Scalar’ object has no attribute ‘copy’

Thanks in Advance

jrbourbeau · November 10, 2022, 7:46pm

Thanks for the question @blest. I don’t necessarily think I would expect MinMaxScaler.fit(...) to work on a Dask Series. At least when I tried the equivalent operation with sklearn.preprocessing.MinMaxScaler and a pandas Series, I got the following error:

ValueError: Expected 2D array, got 1D array instead:
array=[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I would suggest either passing in a Dask DataFrame instead of a Dask Series (this is more of a workaround) or converting your Dask Series to an appropriately shaped Dask Array. Here’s a small working example:

import pandas as pd
import dask.dataframe as dd
from dask_ml.preprocessing import MinMaxScaler

# Create data
df = pd.DataFrame({"A": range(10), "B": range(10, 20)})
ddf = dd.from_pandas(df, npartitions=2)

# Use MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(ddf)
scaler.fit(ddf["A"].to_dask_array(lengths=True).reshape(-1, 1))

jrbourbeau · November 10, 2022, 7:59pm

Though I agree the error message you’re getting isn’t very informative. I’ve opened up this issue Better error message when using invalid `MinMaxScaler.fit(...)` inputs · Issue #951 · dask/dask-ml · GitHub for improving the error message users get in this failure case.

blest · November 10, 2022, 8:26pm

Thanks @jrbourbeau for your answer. I had solved the problem the same way but I wasn’t sure if this is the best way. And thank you for creating the issue.

Topic		Replies	Views
AttributeError: 'DataFrame' object has no attribute 'repartition' Dask DataFrame dask-array	3	2532	January 20, 2022
DataFrame created by DataFrame.apply() Dask DataFrame	1	2110	April 27, 2022
ValueError: If using all scalar values, you must pass an index error message during aggregation of a Dask Dataframe using custom functions Dask DataFrame	3	59	November 1, 2024
Dd.from_dict giving TypeError Dask DataFrame	5	1503	April 3, 2023
AttributeError: SeriesGroupBy object has no attribute ffill Dask DataFrame groupby	3	1313	February 26, 2022

Dask-ml preprocessing raise AttributeError

Related topics