I considered raising this as an issue on dask/dask github, but I’m not convinced I haven’t just misunderstood something about “assign” as a method and its intended use.
I’m interested in whether its possible to use “meta” as with apply, to give dask a hint of the output types of a given function, when using “assign”.
To explain what I mean, consider this dataframe:
df = dd.from_dict({"a": [1, 2, 3], "b": ["a", "b", "c"]}, npartitions=1)
Say I want to concatenate a & b together, with apply I would do something this:
df["c"] = df.apply(
lambda row: row["a"].astype(str) + row["b"],
axis=1,
meta=("c", str),
)
This is just a toy example, and there’s much better ways for concatenating two columns, but ignoring that, the important bit is I can use the “meta” keyword to give dask a hint for what the output of the column is. Say I wanted to use “assign” to set the column instead:
df.assign(c=lambda _df: _df["a"].astype(str) + _df["b"])
This works the same, except I can’t pass in metadata, if I do this:
df.assign(c=lambda _df: _df["a"].astype(str) + _df["b"], meta=("c", str))
I’ll get an error because dask thinks I’m trying to create a column called “meta” and it’ll say something about tuples not being something you can set a column with.
So I’m wondering:
- Is there actually a way to use meta with assign? (looking at the codebase I don’t think so)
- Is this a fundamentally wierd thing to try to do?
- Is it a feature that just doesn’t exist yet, but one day might?