Dask on AWS Sagemaker Exception: 'FSTimeoutError()'

@Hasna94 Welcome to Discourse!

I was able to reproduce this with a LocalCluster and looks like your explicit boto3 client is interfering with Dask’s internals (Dask also uses boto3 internally to connect to S3)

So, I believe using read_parquet directly will work in your case (no need to use the # S3 client section):

import dask.dataframe as dd
from dask.distributed import Client

client = Client()

ddf = dd.read_parquet("s3://coiled-datasets/nyc-taxi/parquet",
                      storage_options={"anon": True, "use_ssl": True},
                     )

ddf.head()
1 Like