Using "meta" with "assign"

benrutter · November 7, 2023, 11:42am

I considered raising this as an issue on dask/dask github, but I’m not convinced I haven’t just misunderstood something about “assign” as a method and its intended use.

I’m interested in whether its possible to use “meta” as with apply, to give dask a hint of the output types of a given function, when using “assign”.

To explain what I mean, consider this dataframe:

df = dd.from_dict({"a": [1, 2, 3], "b": ["a", "b", "c"]}, npartitions=1)

Say I want to concatenate a & b together, with apply I would do something this:

df["c"] = df.apply(
    lambda row: row["a"].astype(str) + row["b"],
    axis=1,
    meta=("c", str),
)

This is just a toy example, and there’s much better ways for concatenating two columns, but ignoring that, the important bit is I can use the “meta” keyword to give dask a hint for what the output of the column is. Say I wanted to use “assign” to set the column instead:

df.assign(c=lambda _df: _df["a"].astype(str) + _df["b"])

This works the same, except I can’t pass in metadata, if I do this:

df.assign(c=lambda _df: _df["a"].astype(str) + _df["b"], meta=("c", str))

I’ll get an error because dask thinks I’m trying to create a column called “meta” and it’ll say something about tuples not being something you can set a column with.

So I’m wondering:

Is there actually a way to use meta with assign? (looking at the codebase I don’t think so)
Is this a fundamentally wierd thing to try to do?
Is it a feature that just doesn’t exist yet, but one day might?

benrutter · November 7, 2023, 6:24pm

After thinking about it I don’t think my question makes sense- meta is only needed for applies and not all column assignments. So you could do this if you where using an apply inside assign:

df.assign(c=df.apply(
    lambda row: row["a"].astype(str) + row["b"],
    axis=1,
    meta=("c", str),
))

guillaumeeb · November 9, 2023, 8:56am

Hi @benrutter,

The example you gave is maybe too simple or straight forward, as you said. I don’t know if there is a need for this possibility in assign.

There is a great blog post about using meta keyword. Near the end, it also introduce the _meta attribute you might use in some rare cases. However, if you ever need to use it, maybe there is a hole in the Dask API that needs to be fixed.

benrutter · November 11, 2023, 10:19am

Thanks! Yeah i haven’t come across a use cade where its needed- think i was struggling with having the wronv mebtal nidel of how lambdas in assign work.

Thanks for the blog link- i didnt know aboug blog.dask.org but it looks like a gold mine of useful stuff!

Topic		Replies	Views
Meta='int' failed Dask DataFrame	1	220	January 15, 2022
DataFrame created by DataFrame.apply() Dask DataFrame	1	2192	April 27, 2022
Using DataFrame apply in a loop Dask DataFrame	2	1201	August 5, 2022
Dask loc not working : Cant able to use assign = operator with it Dask DataFrame	2	292	November 18, 2021
Inconsistencies with Dask Columns & Indices Dask DataFrame	5	23	January 31, 2025

Using "meta" with "assign"

Related topics