I’m trying to process a large chunk of data using map_blocks but I don’t know how to access the data as it is read while processing in parallel.
out = dask.array.map_blocks( torchit, dtype="float32", chunks = chunks )
print("processing")
for err in out:
start = time.time()
val = err.compute()
print( (time.time() - start), "batch complete" )
If I do it this way, then each chunks gets computed one at a time. I think they can be performed in parallel though. I am not dead set on using map_blocks maybe I am using the wrong method to begin with.