How are calls to external storage providers made?

Hi,

I’m a first time Dask-user and I am confused how calls to external storage providers are made (looking for implementations in Git repo). I was referring to Connect to remote data — Dask documentation and it seems like S3 object access is done by Boto3 behind the scenes. How is done for other providers? And where is it being done?

Thanks in advance,

Ryan

Would it be possible for me to contribute integrations with other providers/protocols (not hdfs, s3, …)?

@rkoo19 Welcome to Dask!

Dask uses s3fs backend to access s3 objects, and s3fs uses boto. Similarly, as described in the docs,

  • Dask uses gcsfs for google cloud storage, and
  • adlfs for azure storage

You can consider taking a look at these projects to understand how they work. :slight_smile:

Would it be possible for me to contribute integrations with other providers/protocols (not hdfs, s3, …)?

Contributions are always welcome! For this particular contribution, I’d encourage you to open a feature request on the tracker to discuss what you have in mind and the advantage it would have over the current implementations.

1 Like