Hi All, i’m trying to deploy dask distributed cluster in a yarn with kerberos auth. For spark job, i need to supply the cluster creation with keytab, principal name. I cant find the similar things in dask-yarn?
I tried to follow the dask-yarn guide (ensuring kinit is loaded, add the env in pack format, etc) but it always fail with “cannot connect to driver”.
Any insight?
For now im deploying it on the local server, but that means i need to pull the data through spark into local 1st and cannot operate directly to the hdfs, and its impacting IO performance really bad - not to mention cannot utilize my cloudera cluster which has much bigger spec vs local server.
Really appreciate any input
Thanks,
D