@dennisd Thanks for the details!
I was able to reproduce this, and looks like it’s because you’re calling pandas to_datetime, and assigning it to a Dask DataFrame. You’ll need to use Dask DataFrame’s API here:
from datetime import datetime
import pandas as pd
import dask.dataframe as dd
df = pd.DataFrame({'date3': ['1232021', '1332021', '1432021', '1532021', None]})
ddf = dd.from_pandas(df, npartitions=2)
ddf['date3'] = dd.to_datetime(ddf['date3'], format="%d%m%Y")
ddf.compute()
I believe you wouldn’t need your step-wise workaround after this. That said, just to clarify, you’re getting the TypeError because you may have floats/NaNs in your DataFrame, and datetime.strptime only accepts strings. So, you may need to clean your dataset before converting it to datetime.
Let me know if this helps!