Hello all!
I was looking to use DaskXGBRegressor since I am familiar using DaskXGBClassifier with no issues whatsoever - however when I am predicting on a trained DaskXGBRegressor model ~80% of the predictions come back as Nan|Null / when I compare predictions of a similarly setup XGBRegressor (non Dask version) output it makes predictions with no Nan|Null values (same hyperparamters/saved booster used/input data used for training & predicting) -
Looking at the documentation (dask_ml.xgboost.XGBRegressor — dask-ml 2022.5.28 documentation nothing jumps out at me… - does anyone have any tips/directions? Thank you in advance / Really appreciate it!
Sample code:
Dask Version:
import xgboost
import dask.dataframe as dd
from distributed import LocalCluster, Client
cluster = LocalCluster() ##
client = Client(cluster)
xgb_model_latest = xgboost.dask.DaskXGBRegressor()
xgb_model_latest.load_model('pretrained_model.json')
columns_used = pd.read_csv('DASK_features.csv')
columns_used = columns_used.iloc[:,1] # format dataframe
# X and y must be Dask dataframes or arrays
X = X[columns_used.to_list()] # make sure the structure of columns used matches X
xgb_model_latest.client = client # set distributed client to the model
y_pred = xgb_model_latest.predict(X)
y_pred_regression = y_pred.to_frame(name='forecast')
Non Dask Version
columns_used = pd.read_csv('DASK_features.csv')
columns_used = columns_used.iloc[:,1] # format dataframe
X = X[columns_used.to_list()].compute() # make sure the structure of columns used matches X
hh_id = ddf.THD_HH_ID.compute() # get hh_ids to merge predictions with
xgb_model_latest = xgboost.XGBRegressor()
xgb_model_latest.load_model('pretrained_model.json') ## adhoc testing
y_pred = xgb_model_latest.predict(X)