I’m looking to read in parquet files from HDFS. I’ve used the general setup
df = dd.read_parquet(‘hdfs:///hdfs/file/path/your_file.parquet’) but I am getting an OSError: Prior attempt to load libhdfs failed. After doing some research, I noticed a potential problem may be that the file path given is directing Dask to look at a local hdfs rather than my companies distributed one.
Is it possible I would need to specifiy the file system using HDFSFileSystem(host=‘’, port=xxxx)?
Any help/direction would be much appreciated.
Thanks.