Computing chunks locally before sending to workers with map_blocks

Hi @astrofrog,

Could you add some details on your workflow? How are you reading the files on the first place? Do you have some code snippet?

Here, I assume you mean read each chunk from the Python main script on Client side, and then send the blocks to Workers for further processing? This will still mean a lot of Network IO, it’s generally much more effective to read from Workers at first, but I understand this is complicated in your case.

It should be doable with a mix of Futures and building a Dask Array from_delayed, you’ll have to scatter all the data manually from the Client side. You’ll have to do that carefully to avoid any memory issue though. Maybe there are more suitable solutions, I’m not sure.