Hi, I am trying to get the maximum value from a dask dataframe column of list values with the possibility that some of the list values can be None.
I am trying the below logic.
import dask.dataframe as dd
import pandas as pd
# Sample data
data = {'list_column': [[1, 2, 3], [None, 5, 6], [7, None, 9], [None, None, None]]}
df = pd.DataFrame(data)
ddf = dd.from_pandas(df, npartitions=1)
def safe_numeric_max(row):
# Keep only numeric (int/float) values
numerics = [x for x in row if isinstance(x, (int, float))]
return max(numerics) if numerics else None
ddf['max_val'] = ddf['list_column'].map(safe_numeric_max, meta=('max_val', 'float64'))
ddf.compute()
However, I am always getting the None as the maximum value
list_column max_val
0 [1, 2, 3] None
1 [None, 5, 6] None
2 [7, None, 9] None
3 [None, None, None] None
In Pandas, I see no issue
import pandas as pd
# Sample data
data = {'list_column': [[1, 2, 3], [None, 5, 6], [7, None, 9], [None, None, None]]}
df = pd.DataFrame(data)
def safe_numeric_max(row):
# Keep only numeric (int/float) values
numerics = [x for x in row if isinstance(x, (int, float))]
return max(numerics) if numerics else None
df['max_val'] = df['list_column'].map(safe_numeric_max)
print(df)
list_column max_val
0 [1, 2, 3] 3.0
1 [None, 5, 6] 6.0
2 [7, None, 9] 9.0
3 [None, None, None] NaN
In a regular Python approach, I also see no issue
lst = [7, None, 9]
numerics = [x for x in lst if isinstance(x, (int, float))]
max(numerics)
9
Can you please help if you know what I might be doing wrong here or suggest an alternative approach?
Thanks