Hello,
I am currently trying to use Dask with GPU. My current desired functionality is to create a larger-than-memory matrix of arbitrary size on the host device, send chunks of 3-4GB to device to do computations on the GPU and send it back to the host in order to perform computations on the GPU and not CPU. Currently, I get an out-of-memory error when doing ~4GB work-loads on a TU116 [GeForce GTX 1660 SUPER] with 6GB VRAM. The following sniplet will showcase the error:
import cupy as cp
import numpy as np
import dask.array as da
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
if __name__ == '__main__':
cluster = LocalCUDACluster('0', n_workers=1)
client = Client(cluster)
print(client)
shape = (1024, 1024, 1000)
chunks = (256, 256, 1000)
huge_array_gpu = da.ones_like(cp.array(()), shape=shape, chunks=chunks)
array_sum = da.multiply(huge_array_gpu, 17).compute()