TypeError: cannot pickle 'fasttext_pybind.fasttext' object

daskee · May 4, 2023, 3:09pm

I want to train a text classification model. And I want to vectorize the data using fasttext.

This is the code so far:

cluster = LocalCluster(n_workers=4, threads_per_worker=1)
client = Client(cluster)

vectorizer = fasttext.train_unsupervised('tests/data/case_df_50k.csv')

def transform(texts):
    texts = [text.split() for text in texts]

    vectors = []
    for text in texts:
        word_vectors = []
        # get vector for each word in text and average
        for word in text:
            word_vectors.append(vectorizer.get_word_vector(word))

        vectors.append(np.mean(word_vectors, axis=0))

    return np.array(vectors)

vectors = ddf.map_partitions(lambda df: transform(df.text))

model = SGDClassifier()
model = Incremental(model)
model.fit(vectors, ddf['label'], classes=ddf['label'].unique())

When I run this pipeline, I got the error: TypeError: cannot pickle 'fasttext_pybind.fasttext' object (the full trace is pretty long).

What could I do?

Also, suggestions about how can I preprocess and vectorize my data memmory-efficiently would be appeciated! Thanks.

guillaumeeb · May 4, 2023, 5:58pm

Hi @daskee,

This suggest that the fasttext library uses some low level objects that are not serializable, and thus cannot be exchanged between Client/Scheduler/Workers.

One thing you could try would be to import fasttext and create the vectorizer object inside the transform function.

Topic		Replies	Views
"TypeError: cannot pickle '_asyncio.Task' object" only when executing function inside of a class Distributed fsspec , s3 , distributed	3	316	August 30, 2024
Dask with Neo4j Dask DataFrame	0	153	October 22, 2022
Serialization error when converting Dask Dataframe to Dask Array Dask DataFrame dask-array , distributed	2	1383	May 11, 2022
Getting the error in executing the tasks in parallel Distributed delayed	7	735	February 15, 2022
TypeError: 'Could not serialize object of type HighLevelGraph' Distributed	2	665	May 1, 2024

TypeError: cannot pickle 'fasttext_pybind.fasttext' object

Related topics