How to improve Dask read_parquet performance while reading 20000 parquet files (very few are corrupted)?

I have 20000 parquet files and partitioned by name. I tested spark on this dataset and it was able to do this spark.read_parquet(folder_path).count(). However, when I call read_parquet in Dask it takes forever and there is no cpu/memory being peaked.

I also tried passing in the list of parquet files and could notice deteriorating performance with increasing files (from 0 to 20000).

My guess is Dask (pyarrow) is trying to find column information by reading each parquet file. May I know how to improve this?

df = dd.read_parquet(f'folder_path', columns=["Name", "PhoneNumber", "CallRecords"], engine="pyarrow", ignore_metadata_file=True)