@Hasna94 Welcome to Discourse!
I was able to reproduce this with a LocalCluster
and looks like your explicit boto3 client is interfering with Dask’s internals (Dask also uses boto3 internally to connect to S3)
So, I believe using read_parquet
directly will work in your case (no need to use the # S3 client
section):
import dask.dataframe as dd
from dask.distributed import Client
client = Client()
ddf = dd.read_parquet("s3://coiled-datasets/nyc-taxi/parquet",
storage_options={"anon": True, "use_ssl": True},
)
ddf.head()