I’m using intake to read a set of CSV files, and it uses dask.dataframe.read_csv()
to read the CSVs in parallel. Unfortunately the data provider is throwing a 500 error on one of the URLs since these CSVs are generated dynamically, and they aren’t able to deal with the load. Is there a way to tell dask to read these CSV files sequentially? Thanks for any tips you can provide!
I can seen now that it’s resource intensive for the website to support the many parallelized HEAD requests that are being sent. Is there a way to control this?
@edsu Welcome! Would you be able to share a minimal example? I’d be happy to reproduce it locally and try to find a solution.