DataFrame created by DataFrame.apply()

pfrenard · April 15, 2022, 9:23am

Hello guys,

I just discovered Dask, I have to deal with huge data (1.6TB per csv file) and I think Dask can help me

I need to apply “basic” data transformation, and I am using apply() function to do so.

I have this function.

def extract_data(row):
  ret=dict()
  # do regexp stuff on a specific column
  # generate a few values, store them in the dict ret
 return ret['value1'],ret['value2']

then I apply this function to the daskdataframe

meta=[ ('value1', str),('value2',str) ]
newddf = ddf.apply(extract_data, axis=1, meta=meta)

print(newdf) gives me something like that:

Dask DataFrame Structure:
                value 1 value2  
npartitions=1                                                            
               object  object    
                  ...     ...       
Dask Name: apply, 12 tasks

when I try to run newddf.head() I have an error
**AttributeError** : 'DataFrame' object has no attribute 'name'

What did I do wrong ?

I can run exactly the same code on a pandas dataframe with no issue.

Thanks for your help !

pavithraes · April 27, 2022, 1:39pm

@pfrenard Welcome to Discourse!

I was able to reproduce this and the error is in how you’re defining meta. The output of extract_data is a tuple, and meta needs to match that. You can use something like: meta = ("Result", object)

import pandas as pd
import dask.dataframe as dd

df = pd.DataFrame({'x': list(range(5))})
ddf = dd.from_pandas(df, npartitions=2)

def extract_data(row):
    ret = {'value1': 'p', 'value2': 'q'}
    return ret['value1'], ret['value2']

meta = ("Result", object)

newddf = ddf.apply(extract_data, axis=1, meta=meta)

newddf.compute()

Ref docs: dask.dataframe.DataFrame.apply — Dask documentation

Topic		Replies	Views
Using DataFrame apply in a loop Dask DataFrame	2	1203	August 5, 2022
Meta='int' failed Dask DataFrame	1	220	January 15, 2022
DDF is converting column of lists/dicts to strings Dask DataFrame	2	997	January 18, 2024
Using "meta" with "assign" Dask DataFrame	3	526	November 11, 2023
AttributeError: module 'dask.dataframe' has no attribute 'from_dict' Dask DataFrame	2	390	June 30, 2023

DataFrame created by DataFrame.apply()

Related topics