Hi Dask community,
I’m taking my first steps with Dask, and I’m particularly interested in performing intensive calculations with Numpy arrays, mainly using the einsum
module. When doing this with dask.array
, I’ve noticed that the calculations consume so much memory that the process aborts due to resource exhaustion.
A simplified version of my code is as follows:
import numpy as np
import dask.array as da
n1 = 18
n2 = 30
c1 = np.random.random((n2, n1, n2, n2))
c2 = np.random.random((n2, n2, n1, n2))
c3 = np.random.random((n1, n2, n1, n1))
c4 = np.random.random((n2, n1, n1, n1))
mask_bg = np.eye(n1)
mask_np = np.eye(n2)
mask_ag = np.eye(n1)
mask_mp = np.eye(n2)
delta_vir = 1 - np.eye(n2)
delta_occ = 1 - np.eye(n1)
deltas = np.einsum('nm,ab->nbma', delta_vir, delta_occ)
s_2 = da.einsum('ag,nbmp->nbmapg', mask_ag, c1)
s_2 += da.einsum('bg,nmap->nbmapg', mask_bg, c2)
s_2 += da.einsum('np,bmag->nbmapg', mask_np, c3)
s_2 += da.einsum('mp,nbag->nbmapg', mask_mp, c4)
s_2 = da.einsum('nbma,nbmapg->nbma', deltas, s_2).compute()
I’ve try with dask.map_blocks() but with no better results.
Is there any way to improve the RAM usage more efficiently than just adjusting the chunk size? Is it possible to set a maximum amount of RAM to be used during the calculation?
I appreciate any guidance or advice you can offer.
Best regards
Daniel