How big is your input dataset? I agree this is really weird to have the score function failing like that. Do you need the joblib Context Manager to compute it?
How much memory do you have on your machine?
You could also try to use a LocalCluster and monitor the progress of the call on the dashboard, but I’m still not sure if the score function is benefiting from joblib.
Thank you for your reply!
My variable xbb_drop_dup_onehot’s shape is (40750, 667) and all features have been one-hot encoded. My cluster has 800G memory and I found that they were used out when I computed the score. When fitting random forest I can find that there is very little memory being used on the dashboard, but when I compute the score the memory of workers has not been used but I can find that my memory is used out by using htop.
I think we need more details on your overall implementation. It would be really helpful to see your complete workflow, or better than that, having a reproducer.
I’m really not sure that the score method of sklearn can be natively parallelized using joblib.