Hello guys,
I just discovered Dask, I have to deal with huge data (1.6TB per csv file) and I think Dask can help me
I need to apply “basic” data transformation, and I am using apply() function to do so.
I have this function.
def extract_data(row):
ret=dict()
# do regexp stuff on a specific column
# generate a few values, store them in the dict ret
return ret['value1'],ret['value2']
then I apply this function to the daskdataframe
meta=[ ('value1', str),('value2',str) ]
newddf = ddf.apply(extract_data, axis=1, meta=meta)
print(newdf) gives me something like that:
Dask DataFrame Structure:
value 1 value2
npartitions=1
object object
... ...
Dask Name: apply, 12 tasks
when I try to run newddf.head() I have an error
**AttributeError** : 'DataFrame' object has no attribute 'name'
What did I do wrong ?
I can run exactly the same code on a pandas dataframe with no issue.
Thanks for your help !