Hi,
I am currently running a cluster on my local machine. It is made up of standard dask workers and dask-cuda workers (both instantiated from terminal, as well as scheduler). Basically it is equivalent to a Spec luster with two workers spec
I now want to go on LSF cluster but not quite sure how to deal with these different Worker classes since LSFcluster seems to handle only one worker class.
I attempted to submit manually one job for scheduler and one per worker. Currently it does not work because scheduler cannot āseeā workers thus they cannot be registered, but I am wondering if it is the good way to go.
Any thoughts on that ?
Thanks
Hi @vianneyl,
Sorry for the delay here.
LSFCluster in dask-jobqueue indeed does not currently support multiple worker types. It would need some improvements identified for a long time, but not implemented yetā¦
I think this is the good way to do it manually. You could also start Scheduler in a frontal node as dask-jobqueue is doing for now. Anyway, you seem to run into network issue, which is sometime complex in HPC system configuration. Itās a bit weird though that compute nodes network are not open to each other⦠You should ask your sys admin team for this kind of issues.
Hi Guillaume,
yes there are a lot of IT security restrictionsā¦
I think i will end up by using pure CPU workers and a simple semaphore-based logic to allow some of them to actually use the GPU (if a semaphore slot is available)
This will cost me some I/O between CPU and GPU so not optimal, but at least I will reach by goal ! and besides it will do a better option than dask resources, because resources were 1/ not working with queing 2/ are āhard constraintsā, while my tasks can be actually run either on both GPU/CPU (but preferably GPU). with resources, I had CPU workers idle while the GPU one completely busy.
Vianney
1 Like