Throw error "bytes object is too large" when train large dataset for lightgbm on high performance single host

tongxin.wen · December 7, 2023, 3:27pm

Hi, team, Recently we are trying to train a lightgbm model on a dataset of about 100GiB on a high performance machine with 100cores and 400G RAM.
I used a local cluster and run the code referring this example
The version for lightgbm and dask is
dask 2023.5.0
lightgbm 3.3.5

And I got the the error like these: Any suggestion ? Thank u very much


/usr/local/lib/python3.8/site-packages/lightgbm/dask.py:526: UserWarning: Parameter n_jobs will be ignored.
  _log_warning(f"Parameter {param_alias} will be ignored.")
/usr/local/lib/python3.8/site-packages/lightgbm/dask.py:526: UserWarning: Parameter nthread will be ignored.
  _log_warning(f"Parameter {param_alias} will be ignored.")
/usr/local/lib/python3.8/site-packages/distributed/client.py:3108: UserWarning: Sending large graph of size 76.51 GiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(
2023-12-07 23:19:08,103 - distributed.protocol.core - CRITICAL - Failed to Serialize
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/distributed/protocol/core.py", line 109, in dumps
    frames[0] = msgpack.dumps(msg, default=_encode_default, use_bin_type=True)
  File "/usr/local/lib/python3.8/site-packages/msgpack/__init__.py", line 36, in packb
    return Packer(**kwargs).pack(o)
  File "msgpack/_packer.pyx", line 294, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 300, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 297, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 202, in msgpack._cmsgpack.Packer._pack
ValueError: bytes object is too large
2023-12-07 23:19:08,107 - distributed.comm.utils - ERROR - bytes object is too large

guillaumeeb · December 8, 2023, 12:59pm

Hi @tongxin.wen, welcome to Dask community,

How are you reading your input dataset? There’s a hint in the error message:

/usr/local/lib/python3.8/site-packages/distributed/client.py:3108: UserWarning: Sending large graph of size 76.51 GiB.
This may cause some slowdown.

I think you first read you data localy before trying to feed it into Dask or your model, you should read your data by chunks on the Workers. msgpack cannot serialize objects of more than 4GiB.

tongxin.wen · December 12, 2023, 7:00am

Thanks @guillaumeeb
I also noticed this msg
At first my plan was to run it in local host and then extend the job to run on a cluster.
But now seems I have to run it in cluster.

guillaumeeb · December 13, 2023, 1:58pm

I’m not sure of what you are saying. The important point is to read your input data through Workers directly.

Topic		Replies	Views
ValueError: bytes object is too large distributed	1	126	May 29, 2024
Got error by running demo code Distributed distributed	1	215	August 18, 2023
LightGBM Distributed Training Distributed dask-array , distributed , dask-ml	2	23	March 9, 2025
Errors training xgboost with parquet files on single node Dask DataFrame	3	413	April 28, 2023
TypeError: 'Could not serialize object of type HighLevelGraph' Distributed	2	644	May 1, 2024

Throw error "bytes object is too large" when train large dataset for lightgbm on high performance single host

Related topics