Does len(ddf.index) compute the entire dataframe?

Hi.

Dask operations are lazy, meaning they don’t execute until you explicitly ask for the result with .compute() function.

Although there is no explicit docs about this, is this statement true?

when it comes to len(df.index) , Dask is able to compute just the length of the index without computing the entire DataFrame.

This is because the length of the index is a metadata operation that doesn’t require looking at the actual data.

is this correct or does it computes the entire dataframe?

Hi @Hvuj, welcome to Dask community!

Yes it is. There is some documentation about it:

https://docs.dask.org/en/stable/user-interfaces.html#laziness-and-computing

However, there are exceptions, like implementations of Python default function, like len. In this implementation, Dask actually does compute an entire Serie at least. This is not lazy, and not a metadata operation. Dask has to go trough all the partitions to compute their length, and sum all the results.

1 Like