DataFrame created by DataFrame.apply()

Hello guys,

I just discovered Dask, I have to deal with huge data (1.6TB per csv file) and I think Dask can help me :slight_smile:

I need to apply “basic” data transformation, and I am using apply() function to do so.

I have this function.

def extract_data(row):
  ret=dict()
  # do regexp stuff on a specific column
  # generate a few values, store them in the dict ret
 return ret['value1'],ret['value2']

then I apply this function to the daskdataframe

meta=[ ('value1', str),('value2',str) ]
newddf = ddf.apply(extract_data, axis=1, meta=meta)

print(newdf) gives me something like that:

Dask DataFrame Structure:
                value 1 value2  
npartitions=1                                                            
               object  object    
                  ...     ...       
Dask Name: apply, 12 tasks

when I try to run newddf.head() I have an error
**AttributeError** : 'DataFrame' object has no attribute 'name'

What did I do wrong ?

I can run exactly the same code on a pandas dataframe with no issue.

Thanks for your help !

@pfrenard Welcome to Discourse!

I was able to reproduce this and the error is in how you’re defining meta. The output of extract_data is a tuple, and meta needs to match that. You can use something like: meta = ("Result", object)

import pandas as pd
import dask.dataframe as dd

df = pd.DataFrame({'x': list(range(5))})
ddf = dd.from_pandas(df, npartitions=2)

def extract_data(row):
    ret = {'value1': 'p', 'value2': 'q'}
    return ret['value1'], ret['value2']

meta = ("Result", object)

newddf = ddf.apply(extract_data, axis=1, meta=meta)

newddf.compute()

Ref docs: dask.dataframe.DataFrame.apply — Dask documentation