Memory error creating larger-than-memory matrices

Hello,
I am currently trying to use Dask with GPU. My current desired functionality is to create a larger-than-memory matrix of arbitrary size on the host device, send chunks of 3-4GB to device to do computations on the GPU and send it back to the host in order to perform computations on the GPU and not CPU. Currently, I get an out-of-memory error when doing ~4GB work-loads on a TU116 [GeForce GTX 1660 SUPER] with 6GB VRAM. The following sniplet will showcase the error:

import cupy as cp
import numpy as np
import dask.array as da
from dask_cuda import LocalCUDACluster
from dask.distributed import Client



if __name__ == '__main__':
    cluster = LocalCUDACluster('0', n_workers=1)
    client = Client(cluster)    

    print(client)
    
    shape = (1024, 1024, 1000)
    chunks = (256, 256, 1000)

    huge_array_gpu = da.ones_like(cp.array(()), shape=shape, chunks=chunks)
    array_sum = da.multiply(huge_array_gpu, 17).compute()