Deploying dask in kerberized yarn

Hi All, i’m trying to deploy dask distributed cluster in a yarn with kerberos auth. For spark job, i need to supply the cluster creation with keytab, principal name. I cant find the similar things in dask-yarn?

I tried to follow the dask-yarn guide (ensuring kinit is loaded, add the env in pack format, etc) but it always fail with “cannot connect to driver”.

Any insight?

For now im deploying it on the local server, but that means i need to pull the data through spark into local 1st and cannot operate directly to the hdfs, and its impacting IO performance really bad - not to mention cannot utilize my cloudera cluster which has much bigger spec vs local server.

Really appreciate any input

Thanks,
D

@dto Welcome to Discourse!

I’m not familiar with the project, but I’ll ping @jacobtomlinson here, who might have some helpful thoughts. :smile:

Would you be able to share your python, dask, distributed, and dask-yarn versions? As well as the complete error traceback?

I’d also encourage you to move this question to the dask-yarn issue tracker to reach the maintainers directly.