Dask.array.einsum memory leak

DanielBajac · May 23, 2024, 6:51pm

Hi Dask community,

I’m taking my first steps with Dask, and I’m particularly interested in performing intensive calculations with Numpy arrays, mainly using the einsum module. When doing this with dask.array, I’ve noticed that the calculations consume so much memory that the process aborts due to resource exhaustion.

A simplified version of my code is as follows:

import numpy as np
import dask.array as da 
n1 = 18
n2 = 30
c1 = np.random.random((n2, n1, n2, n2))
c2 = np.random.random((n2, n2, n1, n2))
c3 = np.random.random((n1, n2, n1, n1))
c4 = np.random.random((n2, n1, n1, n1))

mask_bg = np.eye(n1)
mask_np = np.eye(n2)
mask_ag = np.eye(n1)
mask_mp = np.eye(n2)
delta_vir = 1 - np.eye(n2)
delta_occ = 1 - np.eye(n1)
deltas = np.einsum('nm,ab->nbma', delta_vir, delta_occ)

s_2 = da.einsum('ag,nbmp->nbmapg', mask_ag, c1)
s_2 +=  da.einsum('bg,nmap->nbmapg', mask_bg, c2)

s_2 += da.einsum('np,bmag->nbmapg', mask_np, c3)

s_2 += da.einsum('mp,nbag->nbmapg', mask_mp, c4)

s_2 = da.einsum('nbma,nbmapg->nbma', deltas, s_2).compute()

I’ve try with dask.map_blocks() but with no better results.
Is there any way to improve the RAM usage more efficiently than just adjusting the chunk size? Is it possible to set a maximum amount of RAM to be used during the calculation?

I appreciate any guidance or advice you can offer.

Best regards

Daniel

guillaumeeb · May 24, 2024, 1:12pm

Hi @DanielBajac, welcome to Dask Discourse forum!

I just tried your code, and it was very fast so I didn’t notice any memory problem. Is this example reproducing your problems?

Adjusting the chunk size and expressing the problem differently are the best way to optimize RAM usage. Using a LocalCluster, you can also set a maximum memory per Worker, but if your computation needs more, it can just block the Workers and never finish.

DanielBajac · May 24, 2024, 2:22pm

Is this example reproducing your problems?

Yes, but my laptop has only 12 GB of RAM. For instance, if you increase the matrix size (e.g., n1 and n2 to 100), you will likely encounter problems. The following image shows the memory leak observed in my system monitor

guillaumeeb · May 24, 2024, 2:28pm

With n1 and n2 to 100, you create intermediate Arrays with ~1TB chunks!!

I’m not familiar with the computations you are doing, but I imagine this is were the problem comes from.

Topic		Replies	Views
Dask delayed isn't more quicker that dask.array Dask Array delayed	2	38	July 13, 2024
Memory accumulation using client.map - how can I avoid this? Distributed client	7	1755	March 22, 2022
`dask.array.any` memory behavior Dask Array dask-array	5	31	November 20, 2024
Help with dask/zarr usage -- performance issues with dask Dask Array zarr	10	1937	January 17, 2023
Double memory usage when calling `.compute()` on an array Dask Array performance	3	23	March 6, 2025

Dask.array.einsum memory leak

Related topics