Deploying dask in kerberized yarn

Hi All, i’m trying to deploy dask distributed cluster in a yarn with kerberos auth. For spark job, i need to supply the cluster creation with keytab, principal name. I cant find the similar things in dask-yarn?

I tried to follow the dask-yarn guide (ensuring kinit is loaded, add the env in pack format, etc) but it always fail with “cannot connect to driver”.

Any insight?

For now im deploying it on the local server, but that means i need to pull the data through spark into local 1st and cannot operate directly to the hdfs, and its impacting IO performance really bad - not to mention cannot utilize my cloudera cluster which has much bigger spec vs local server.

Really appreciate any input

Thanks,
D

@dto Welcome to Discourse!

I’m not familiar with the project, but I’ll ping @jacobtomlinson here, who might have some helpful thoughts. :smile:

Would you be able to share your python, dask, distributed, and dask-yarn versions? As well as the complete error traceback?

I’d also encourage you to move this question to the dask-yarn issue tracker to reach the maintainers directly.

1 Like

I don’t have any particularly helpful thoughts here, but an issue on dask-yarn would be great.

1 Like

Hi All, thank you very much for your response. I’m putting this issue on hold for now, since i just found out there will be CDP upgrade in my place. I will raise the dask-yarn requirement to the team who manage the upgrade. So far from the log i’m suspecting issue with kerberos auth (most likely the authentication didnt pass properly).

If they fail, i will raise issue in dask-yarn with all the detail.

Thank you and have a nice day