Importing code and sys.path

We have a custom codebase that we were planning to run parts of through distributed.
Multiple users, each developing their own code tree - I set up an EC2Cluster, added workers, and we NFS mount everyone’s home directory to all workers to get access to the custom code.

However, in order for import to work, we normally add the repo to sys.path, then import things relative to the repo. If I modify sys.path on the workers, it seems to be global across all clients and all users.

How is that meant to work for a general-purpose multi-user cluster, if there’s no isolation, and each of them can accidentally trash each other’s configuration? Am I missing some fundamental element of setup, or way we can turn some level of protection on to at least isolate different client connections from each other?

How is that meant to work for a general-purpose multi-user cluster, if there’s no isolation, and each of them can accidentally trash each other’s configuration?

@martin I don’t think Dask can currently (or is meant to) support multiple users simultaneously using the same cluster. There isn’t any way we can have all of their software environments running in isolation, at once.

We suggest having separate clusters per user, and then they can use upload_file() to have their custom code on all the workers in their cluster. You can also consider looking into Dask gateway or hosted solutions that have better support for teams.

I agreed it doesn’t seem to be set up well for that right now.
I would suggest that going forward, separate cluster per user is not a good model though:

  1. It wastes a lot of resources, even if we dynamically allocate machines from EC2, probably would mean wasting one per user plus a minimum of one worker machine per cluster to get low latency response.
  2. It doesn’t fully solve the problem. A user might want to test a change from an experimental tree, as well as running a test from master branch. It would really have to be a whole cluster per client.

I’m not convinced that adding another whole layer of complexity on top of the existing stack is desirable either - running even a small job has a higher startup time, and it’s yet more stuff to manage.

I think what I was expecting it to do was to use a separate python process on the worker per client, which seems like it would be low-latency and transparent to the user.

Maybe have each node run a dask-worker processes or containers for each user. Packages are maintained at a process-level, so I think there’s no way to get around having a process per user. But at least this way they share the cloud infrastructure.