Memory fluctuating but no tasks being processed

On my dashboard, I am seeing the memory usage on the cluster and each individual worker fluctuating (< 20%), but no task is being processed. What could be some possible explanations of this?

Could this be caused by a .compute() or by client.scatter() on a large array? Currently I have

filtered_waves = filtered.compute()
filtered_da = da.from_array(filtered_waves,chunks=wave_on_slice_channel.chunks)
filtered_futures = client.scatter(filtered_da, broadcast=True)

I am fairly certain that the top .compute() is completed. I suspect very much that the code is stuck somewhere in between the second and third line, i.e. by the scatter.

However, before this section of code, I have done exactly the same with wave_on_slice_channel :

wave_future = client.scatter(wave_on_slice_channel,broadcast=True)

and wave_on_slice_channel as well as filtered_da have exactly the same shape and size (~ 11 GB).

My individual workers each have 100 GB and the cluster has > 2 TB of memory.

Hi @axelwang,

If you have access to the Dashboard, you should be able to tell if this compute call has ended, don’t you? Another solution would be to execute step by step though Ipython or a Notebook.

But I agree with you, I suspect also the client.scatter call. I think this is normal that your seeing no tasks on the cluster during this call. I’m not sure what will be the result of broadcasting a Dask Array though. Why don’t you just broadcast the resulting filtered_waves array?

Another solution would be to persist() the dask array in memory, but it will be distributed among your cluster.

What are you doing next that needs a broadcasted Array?