I am trying to load two sets of csv files, merge them and save the result to a parquet file. The full data set is larger than memory so I a purposefully not calling compute() (which would bring everything into memory if my understanding is correct). However no data seems to be written.
Is there anyway to do this without running out of memory?
The data will later be filtered at which stage it will fit in memory, but I trying to avoid the concatenation and merging steps (along with some other processing) needing to be run each time.
Thanks
if __name__ == '__main__':
stops = dd.read_csv('stop*.csv', delimiter=";", header=0)
types = dd.read_csv('type*.csv', delimiter=";", header=0)
merged = dd.merge(stops, types, on='date', how='left')
#merged = merged.compute()
merged.to_parquet('myParquetFile.parquet')