Nanny Forces Single Core Usage

Hi all,

I am running into an odd issue whenever my tasks call a compiled C++ executable (via subprocess). With the Nanny=True (default), each task gets pinned to 1 CPU core (not what I want), but as soon as I start workers with Nanny=False, it works as expected and spreads code across all cores. Ideally, I’d want Nanny=True for all its benefits in terms of worker restart.

Minimal Code:

def run():
    
    script = textwrap.dedent(
        f"""
        bash -l <<-'HEREDOC'
        conda activate my-kernel
        /code/compiledCppCodeHere
        HEREDOC
        """
    )
    process = subprocess.Popen(
        script, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE
    )
    stdout, stderr = process.communicate()    
    return stdout

f = [client.submit(run, pure=False) for i in range(0, 2)]
client.gather(f)

CPU profile with Nanny=True (What I dont expect)

CPU Profile with Nanny=False (what I expect)

Digging a bit deeper, I see that Nanny spawns itself via multiprocessing module. Not sure if this is the issue?

image

So a few questions:

  1. How can I keep Nanny=True and allow subprocess to run as expected?
  2. If not, is there an alternative method (different than multiprocessing) that I can configure the Nanny workers? Not sure if that is the problem to be honest.

Any pointers or background as to this would be much appreciated! Thanks!