Hi , whenever I am trying to compute svd for comparatively larger matrix not fitting on the gpu, I am getting cuda out of memory error on kaggle 2 x15gb gpu.
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster()
client = Client(cluster)
import cupy
import dask.array as da
import time
start=time.time()
rs = da.random.RandomState(RandomState=cupy.random.RandomState)
# Create the data and run the SVD as normal
x = rs.randint(0, 100, size=(1000000_000, 2_000),
chunks=(10000, 2000), dtype="uint8")
xp = x.persist()
u, s, v = da.linalg.svd_compressed(xp, k=2000, seed=rs)
v.compute()
u = u.compute()
s = s.compute()
print("ended in ",time.time()-start)
What I want here is to get distributed streaming svd computation implemented. Which means currently we already have distributed svd computation method using cupy for GPU computation but we don’t have streaming distributed svd computation which means even if i have a very huge matrix like with billions of rows and 10k columns then we must be able to compute svd in streaming format like in chunks, and then aggregate the final output on storage if it can’t be loaded on gpu memory.
Apart from that I require incremental distributed streaming svd computation which means , i have a matrix of shape AxB which is actually (1000000_000, 2_000) and after that i have another matrix A+C x B+D of shape (3000000_000, 2_000) where we already have matrix AxB in this new matrix . Now cause we have already computed SVD for matrix AxB we don’t want to recompute svd on this portion of matrix for this new larger matrix which we have already have computed , instead we want svd computed for only this new data only in streaming distributed manner.
Yes, You are absolutely right here. But using streaming distributed SVD computation it must be possible. Even if we have 8 GPUs of 24gb each and data size will be 250 TB. Kindly guide me like how this problem can be solved.
u.compute() is computing all chunks of u and returning it as a standard Numpy Array.
If you want to stream the computation, then you should write the result on disk (big disk), using an appropriate array format (Zarr?).
Well, you won’t be able to convert a larger than memory array to numpy, this will never be possible. Instead you should try to stream the data to disk using to_zarr or any other output format.
I’m not sure I see the other features you asked, neither what you mean by designation?
Oh , by designation I mean like are you playing any specific role on this dask forum officially . I did shared my contact details to make discussion faster. Could you please reach me out over there. I get notified by mails so I got late to respond.