Using "meta" with "assign"

I considered raising this as an issue on dask/dask github, but I’m not convinced I haven’t just misunderstood something about “assign” as a method and its intended use.

I’m interested in whether its possible to use “meta” as with apply, to give dask a hint of the output types of a given function, when using “assign”.

To explain what I mean, consider this dataframe:

df = dd.from_dict({"a": [1, 2, 3], "b": ["a", "b", "c"]}, npartitions=1)

Say I want to concatenate a & b together, with apply I would do something this:

df["c"] = df.apply(
    lambda row: row["a"].astype(str) + row["b"],
    axis=1,
    meta=("c", str),
)

This is just a toy example, and there’s much better ways for concatenating two columns, but ignoring that, the important bit is I can use the “meta” keyword to give dask a hint for what the output of the column is. Say I wanted to use “assign” to set the column instead:

df.assign(c=lambda _df: _df["a"].astype(str) + _df["b"])

This works the same, except I can’t pass in metadata, if I do this:

df.assign(c=lambda _df: _df["a"].astype(str) + _df["b"], meta=("c", str))

I’ll get an error because dask thinks I’m trying to create a column called “meta” and it’ll say something about tuples not being something you can set a column with.

So I’m wondering:

  • Is there actually a way to use meta with assign? (looking at the codebase I don’t think so)
  • Is this a fundamentally wierd thing to try to do?
  • Is it a feature that just doesn’t exist yet, but one day might?

After thinking about it I don’t think my question makes sense- meta is only needed for applies and not all column assignments. So you could do this if you where using an apply inside assign:

df.assign(c=df.apply(
    lambda row: row["a"].astype(str) + row["b"],
    axis=1,
    meta=("c", str),
))

Hi @benrutter,

The example you gave is maybe too simple or straight forward, as you said. I don’t know if there is a need for this possibility in assign.

There is a great blog post about using meta keyword. Near the end, it also introduce the _meta attribute you might use in some rare cases. However, if you ever need to use it, maybe there is a hole in the Dask API that needs to be fixed.

1 Like

Thanks! Yeah i haven’t come across a use cade where its needed- think i was struggling with having the wronv mebtal nidel of how lambdas in assign work.

Thanks for the blog link- i didnt know aboug blog.dask.org but it looks like a gold mine of useful stuff!