Hi everyone. I just happened to read a very huge file using dask.dataframe.read_parquet()
. The problem I am having now is related to the dataset. Upon executing the codes below, I get an error unhashable type: 'numpy.ndarray'
. Snippet:
train_path = ["/somfile.train.snappy.parquet"]
data = dd.read_parquet(train_path, engine="pyarrow", compression = "snappy", columns = "X_jets",
split_row_groups=10)
data = data.value_counts()
data = data.compute()
I understand that my data contains a numpy array. The question then is, how do I load it? Should I get away of calling compute()
? Here is a link to the file that I am using: QCDToGGQQ_IMGjet_RH1all_jet0_run1_n47540.test.snappy.parquet - Google Drive
Any form of help will be highly appreciated. Thanks!