How fetch rows from another Dask dataframe by matching Dask dataframe's ID columns?


I have two Dask dataframes, and both have huge data, and i can’t handle in Pandas on single machine.

How can i get the C dask dataframe from A and B Dask dataframe?

ID Values
1 [ 123, 456, 789]
2 [ 234. 456. 777]

ID values
123 [string1, string2]
456 [string3, string4]
789 [string5, string6]
234 [string7, string8]
777 [string9, string10]

ID values
1 [ string1, string2, string3, string4, string5, string6]
2 [ string7, string8, string3, string4, string9, string10]

Hi @Sam,

Do you have a working Pandas solution?

What I would try is:

  1. Explode the Values column of A dataframe
  2. Join A and B on the new column and B.ID
  3. Groupby the result on A.ID column

Hi @guillaumeeb ,

Yes, I have done similar steps in Pandas via map_partitions call
df A map partitions with explore
merge A and B on ID
groupby A ID