Using dask.dataframe's to_datetime on a pandas dataframe

benrutter · June 28, 2023, 8:10am

I hit an error in some code a while back where I was accidentally passing a pandas dataframe into dask’s “to_datetime” function. I’d have expected it to either run as normal or throw an error, but the output seemed to be an assortment of duplicated rows.

I realise this isn’t an error on dask’s part at all, but in my bad implementatinon (and the fix is simple enough) but since the output seems counterintuitive to what I’d expect, I’m curious about why it is that this happens? Does anyone know what’s going on under the hood for this to outuput?

guillaumeeb · June 29, 2023, 1:05pm

Hi @benrutter, welcome to Dask Discourse Forum!

Do you have a reproducer? I just tried with a small example, and it’s working as I would expect:

import dask.dataframe as dd
import pandas as pd

df = pd.DataFrame({'year': [2015, 2016],
                   'month': [2, 3],
                   'day': [4, 5]})
dd.to_datetime(df).compute()

Result:

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

benrutter · July 3, 2023, 3:26pm

Thanks @guillaumeeb!

I’ve tried recreating it, but actually can’t! Apologies - maybe it was caused by something else somewhere.

If I’m able to recreate it I’ll share it here

Topic		Replies	Views
Dask created a datetimeindex and I cannot assign it back to the source ddf Dask DataFrame	5	359	March 8, 2022
Convert column of string to column of datetime Dask DataFrame	1	1397	March 31, 2023
Importing dask.dataframe broke pandas code Dask DataFrame	4	152	March 19, 2024
How to tell dask about timezone info in `dd.to_datetime`? Dask DataFrame	4	42	October 21, 2024
DataFrame created by DataFrame.apply() Dask DataFrame	1	2224	April 27, 2022

Using dask.dataframe's to_datetime on a pandas dataframe

Related topics