I’m don’t think this is a bug, but I’m a little bit stumped around how to tell dask the datatype in dd.to_datetime.
Obviously the datatype is “datetime64” but there are methods that will throw errors on “datetime64[ns, UTC]” (like tz_localize) that won’t throw errors on “datetime64[ns]” and vice-versa (like tz_convert) so not having run-time information on the timezone info is currently throwing up a bug for me.
My tricky bit, is that the date is being converted initially from a string, like so:
The string contains timezone info, but dask doesn’t know this until it computes things (wierdly this is only the case with dask-expr backend, and not the old school eager execution).
For instance, I get different answers if I do this:
I can see why this happens, dask hasn’t looked at the string, so has no way of knowing that the whole column is utc. How can I tell it? I was thinking along the lines of something like this:
Do you have a reproducer? I tried your code, but I’m getting an error:
File /work/scratch/env/eynardbg/.conda/envs/python_pangeo_lis/lib/python3.12/site-packages/pandas/core/tools/datetimes.py:1186, in _assemble_from_unit_mappings(arg, errors, utc)
1184 if len(req):
1185 _required = ",".join(req)
-> 1186 raise ValueError(
1187 "to assemble mappings requires at least that "
1188 f"[year, month, day] be specified: [{_required}] is missing"
1189 )
1191 # keys we don't recognize
1192 excess = sorted(set(unit_rev.keys()) - set(_unit_map.values()))
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
Thanks both! And thanks for the speedy fix @Patrick - if I’m reading right though, that PR fixes interpretting the meta information, but I’m still unsure how to actually manually specifiy it!
@guillaumeeb my reproducer in the initial comment should work bar a silly error I made, here’s what the original code block should be:
If it did work previously, it doesn’t now, I get a “can’t set attribute _meta”. I’m guessing this is extra strictness added in to stop unwittingly carrying out hijinx with the optimiser?
Setting meta in to_datetime
Will work after you update to my pr, the information is currently ignored unfortunately but you are already specifying it correctly