I have a code that is similar to this:
dfs = []
for task in tasks:
subtasks = gen_subtasks(task)
df = (
bag
.from_sequence(subtasks)
.to_dataframe(...)
.groupby(...)
.compute())
pd.concat(dfs).to_parquet(...)
My problem is that whenever the loop finishes an iteration, FargateCluster seems to retire all the workers, and then start them all over again on the next iteration.
Is there a way to keep the cluster alive until the end of the program?
As an alternative, I also tried to generate the subtasks as a dask task, but I couldn’t figure out how to do this. How could I integrate the whole process in dask?