Writing Dask Dataframe to HBase

Hello,

I would like to write a Dask Distributed Dataframe to HBase. I wasn’t able to find anything in the docs or online and was wondering if anyone has tried anything like this.

Thanks!!

Hi @Paul_de_Fusco, welcome to Dask Discourse forum!

Well, I’ve never heard of that neither. I have to say I assumed HBase was kind of dead anyway :slight_smile:. Do you really have to use it? Couldn’t you switch to Parquet?

The only think I can think of, if HBase provide something like this, is to use write_sql, but I don’t think this is possible.

Anyway, by looking at source code of other write method, like to_parquet, you should find a way to implement something for HBase.

Maybe @martindurant has some other thoughts about that.

At a guess, you would start with wrapping batch write in dask.delayed functions. I don’t know much about hbase either (except that it’s not SQL, although I gather SQL overlays are available).

1 Like