Not able to compute svd_compressed for bigger matrix

Hi , whenever I am trying to compute svd for comparatively larger matrix not fitting on the gpu, I am getting cuda out of memory error on kaggle 2 x15gb gpu.

from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster()
client = Client(cluster)

import cupy
import dask.array as da
import time

start=time.time()
rs = da.random.RandomState(RandomState=cupy.random.RandomState)

# Create the data and run the SVD as normal
x = rs.randint(0, 100, size=(1000000_000, 2_000),
               chunks=(10000, 2000), dtype="uint8")
xp = x.persist()

u, s, v = da.linalg.svd_compressed(xp, k=2000, seed=rs)
v.compute()
u = u.compute()
s = s.compute()
print("ended in ",time.time()-start)

What I want here is to get distributed streaming svd computation implemented. Which means currently we already have distributed svd computation method using cupy for GPU computation but we don’t have streaming distributed svd computation which means even if i have a very huge matrix like with billions of rows and 10k columns then we must be able to compute svd in streaming format like in chunks, and then aggregate the final output on storage if it can’t be loaded on gpu memory.
Apart from that I require incremental distributed streaming svd computation which means , i have a matrix of shape AxB which is actually (1000000_000, 2_000) and after that i have another matrix A+C x B+D of shape (3000000_000, 2_000) where we already have matrix AxB in this new matrix . Now cause we have already computed SVD for matrix AxB we don’t want to recompute svd on this portion of matrix for this new larger matrix which we have already have computed , instead we want svd computed for only this new data only in streaming distributed manner.

Hi @localhost-server, welcome to Dask community!

Here, you are trying to load all the data into memory, is it intended? Did you try without this line?

yes I did tried using this line. like the error i am getting is
and the code was


Like it doesn’t matter whatever I do if the share is very larger then I get out of memory error .

you can check yourself too.

While trying your example, I see that you are trying to compute an array u weighting 14.5 TiB:

I don’t think you have enough memory to get this result on your machine.

Yes, You are absolutely right here. But using streaming distributed SVD computation it must be possible. Even if we have 8 GPUs of 24gb each and data size will be 250 TB. Kindly guide me like how this problem can be solved.

u.compute() is computing all chunks of u and returning it as a standard Numpy Array.
If you want to stream the computation, then you should write the result on disk (big disk), using an appropriate array format (Zarr?).

Even if I keep the size to an extent like areound 30gb it throws an error.

I see on your code you are using copy thus GPU. How much memory do you have on this device ?

I am using kaggle free version. Two 15gb gpu is available.

Compute call is trying to gather all the resulting data into one memory space, so 15GiB maximum in your case.

OK, can this be improved + all other features I enquired for. By the way I didn’t get your designation for dask. Please do share.

Well, you won’t be able to convert a larger than memory array to numpy, this will never be possible. Instead you should try to stream the data to disk using to_zarr or any other output format.

I’m not sure I see the other features you asked, neither what you mean by designation?