Hi All, i’m trying to deploy dask distributed cluster in a yarn with kerberos auth. For spark job, i need to supply the cluster creation with keytab, principal name. I cant find the similar things in dask-yarn?
I tried to follow the dask-yarn guide (ensuring kinit is loaded, add the env in pack format, etc) but it always fail with “cannot connect to driver”.
Any insight?
For now im deploying it on the local server, but that means i need to pull the data through spark into local 1st and cannot operate directly to the hdfs, and its impacting IO performance really bad - not to mention cannot utilize my cloudera cluster which has much bigger spec vs local server.
Hi All, thank you very much for your response. I’m putting this issue on hold for now, since i just found out there will be CDP upgrade in my place. I will raise the dask-yarn requirement to the team who manage the upgrade. So far from the log i’m suspecting issue with kerberos auth (most likely the authentication didnt pass properly).
If they fail, i will raise issue in dask-yarn with all the detail.