Just a followup to this question:
If I submit tasks, specifying the
workers-argument with a list of 3 workers,
for example like this,
n = 3
from distributed import get_worker
a = client.run(lambda: get_worker().address)
w = list(a)[:n]
[‘tls://10.8.10.3:43645’, ‘tls://10.8.102.2:44437’, ‘tls://10.8.103.3:42395’]
for lat, lon in zip(lats, lons):
results = client.submit(function, lat, lon,
workers = w,
allow_other_workers = False
what can then be a reason for the distribution of tasks becoming inefficient?
Taskstream with `workers` = `w`
Taskstream when cluster is scaled down to 3, and `workers` = None, for comparison
I’m a bit lost with your screenshots, the tasks profile (length of green bars) look very different, so I’m not sure what to answer.
I just tested on a
LocalCluster with he following code:
from distributed import Client
client = Client(n_workers=8)
time.sleep(random.random() * 2.0)
n = 3
w = list(client.cluster.workers)[:n]
for i in range(100):
futures.append(client.submit(my_func,i, pure=False, workers=w, allow_other_workers = False))
results = client.gather(futures)
My method for specifying Workers is different than yours because this wasn’t working. But using this code, the occupancy of the Workers I specified looks good in the dashboard.
Maybe it’s a scale problem?
Yes I agree the screenshots maybe were confusing. In the two examples, the calculation is the same, and number of workers is the same, so my question is why the cluster computed them differently.
w = list(client.cluster.workers)[:n], gives me
AttributeError: 'GatewayCluster' object has no attribute 'workers'. So I use
client.run(lambda: get_worker().address) since I could not find any other API-way. Do you know of other alternatives?
Great, I retried with
pure=False. Not able to point exactly to what else I did different. But it seems the computation looks good now, so I should close this. Thanks again!