Hi, I need to deal with impala’s *.har files on remote hdfs storage. I can identify the paths on my own and download files against common storage within bag’s map but this way I am redoing some stuff that typical dask read operations do for me (e.g. globbing paths and downloading). Is there a more dask-friendly way of going around this problem? If I wanted to implement another bag read operation, which of the dask-wide read operations would be the closest to my use case to draw some inspiration from? Thanks!
1 Like
@Antymon Hi and welcome to Disocurse!
Is there a more dask-friendly way of going around this problem?
Since Dask doesn’t support *.har
files directly, your approach sounds good to me!
If I wanted to implement another bag read operation, which of the dask-wide read operations would be the closest to my use case to draw some inspiration from?
Maybe, read_text
?
1 Like