Hello,
I would like to write a Dask Distributed Dataframe to HBase. I wasn’t able to find anything in the docs or online and was wondering if anyone has tried anything like this.
Thanks!!
Hello,
I would like to write a Dask Distributed Dataframe to HBase. I wasn’t able to find anything in the docs or online and was wondering if anyone has tried anything like this.
Thanks!!
Hi @Paul_de_Fusco, welcome to Dask Discourse forum!
Well, I’ve never heard of that neither. I have to say I assumed HBase was kind of dead anyway . Do you really have to use it? Couldn’t you switch to Parquet?
The only think I can think of, if HBase provide something like this, is to use write_sql, but I don’t think this is possible.
Anyway, by looking at source code of other write method, like to_parquet, you should find a way to implement something for HBase.
Maybe @martindurant has some other thoughts about that.
At a guess, you would start with wrapping batch write in dask.delayed
functions. I don’t know much about hbase either (except that it’s not SQL, although I gather SQL overlays are available).