Hi , whenever I am trying to compute svd for comparatively larger matrix not fitting on the gpu, I am getting cuda out of memory error on kaggle 2 x15gb gpu.
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster()
client = Client(cluster)
import cupy
import dask.array as da
import time
start=time.time()
rs = da.random.RandomState(RandomState=cupy.random.RandomState)
# Create the data and run the SVD as normal
x = rs.randint(0, 100, size=(1000000_000, 2_000),
chunks=(10000, 2000), dtype="uint8")
xp = x.persist()
u, s, v = da.linalg.svd_compressed(xp, k=2000, seed=rs)
v.compute()
u = u.compute()
s = s.compute()
print("ended in ",time.time()-start)
What I want here is to get distributed streaming svd computation implemented. Which means currently we already have distributed svd computation method using cupy for GPU computation but we don’t have streaming distributed svd computation which means even if i have a very huge matrix like with billions of rows and 10k columns then we must be able to compute svd in streaming format like in chunks, and then aggregate the final output on storage if it can’t be loaded on gpu memory.
Apart from that I require incremental distributed streaming svd computation which means , i have a matrix of shape AxB which is actually (1000000_000, 2_000) and after that i have another matrix A+C x B+D of shape (3000000_000, 2_000) where we already have matrix AxB in this new matrix . Now cause we have already computed SVD for matrix AxB we don’t want to recompute svd on this portion of matrix for this new larger matrix which we have already have computed , instead we want svd computed for only this new data only in streaming distributed manner.