Does len(ddf.index) compute the entire dataframe?

Hi @Hvuj, welcome to Dask community!

Yes it is. There is some documentation about it:

https://docs.dask.org/en/stable/user-interfaces.html#laziness-and-computing

However, there are exceptions, like implementations of Python default function, like len. In this implementation, Dask actually does compute an entire Serie at least. This is not lazy, and not a metadata operation. Dask has to go trough all the partitions to compute their length, and sum all the results.

1 Like