How to perform a query with @ symbol in a dask dataframe

Hi,

I am trying to perform a query with the @ symbol in a dask dataframe.

import dask.dataframe as dd
import pandas as pd

# Create a Dask DataFrame
df = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10], 'z': ['a', 'b', 'c', 'a', 'b']})
ddf = dd.from_pandas(df, npartitions=1)

# Define a variable
threshold = 3

# Query using f-string
result_fstring = ddf.query(f"x > @threshold")
print("Result using f-string:")
print(result_fstring.compute())

I am always getting the below error message

UndefinedVariableError: local variable 'threshold' is not defined

However, I am able to achieve the same thing with a pandas dataframe without any error

import dask.dataframe as dd
import pandas as pd

# Create a Dask DataFrame
df = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [6, 7, 8, 9, 10], 'z': ['a', 'b', 'c', 'a', 'b']})

# Define a variable
threshold = 3

# Query using f-string
result_fstring = df.query(f"x > @threshold")
print("Result using f-string:")
print(result_fstring)

This is the output with a pandas dataframe

Result using f-string:
   x   y  z
3  4   9  a
4  5  10  b

How can I get this to work with a dask dataframe?

Thanks

Hi @Damilola,

It might just be that you are mixing up f-string and @ support in Pandas. @ is not a valid fstring symbol. e.g.

threshold = 3
print(f"x > @threshold")

will result in

'x > @threshold'

As explained in the Dask documentation:

You can refer to column names that are not valid Python variable names by surrounding them in backticks. Dask does not fully support referring to variables using the ‘@’ character, use f-strings or the local_dict keyword argument instead.

A correct f-string would be:

print(f"x > {threshold}")

The code

result_fstring = ddf.query(f"x > {threshold}")
print("Result using f-string:")
print(result_fstring.compute())

just works!