Skip to content

AutoML XGBoost model abnormal memory usage as the size of dataset grows from 10w to 10M #7716

Open
@lalalapotter

Description

@lalalapotter

Test AutoML XGBoost Classifier example in Almaren Yarn Cluster(cluster mode), with sparse datasets from 100,000 rows(0.7GB) to 10 million rows (72GB) generated by scripts. Found that the memory usage is abnormally scale up as the size of dataset grows Corresponding test results are as following:
image

Otherwise, the application report following error:

�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m 2023-02-28 20:16:58,965	ERROR function_runner.py:268 -- Runner Thread raised error.
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m Traceback (most recent call last):
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 262, in run
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     self._entrypoint()
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 331, in entrypoint
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     self._status_reporter.get_checkpoint())
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 451, in _resume_span
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 597, in _trainable_func
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/bigdl/orca/automl/search/ray_tune/ray_tune_search_engine.py", line 352, in train_func
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/bigdl/orca/automl/xgboost/XGBoost.py", line 158, in fit_eval
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     self.model.fit(x, y, eval_set=eval_set, eval_metric=metric_name)
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 575, in inner_f
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     return f(**kwargs)
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 1397, in fit
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     enable_categorical=self.enable_categorical,
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 457, in _wrap_evaluation_matrices
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     enable_categorical=enable_categorical,
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 1396, in <lambda>
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs),
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 575, in inner_f
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     return f(**kwargs)
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 692, in __init__
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     enable_categorical=enable_categorical,
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/data.py", line 881, in dispatch_data_backend
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     feature_types)
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/data.py", line 187, in _from_numpy_array
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     ctypes.byref(handle),
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 246, in _check_call
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     raise XGBoostError(py_str(_LIB.XGBGetLastError()))
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m xgboost.core.XGBoostError: std::bad_alloc
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m Exception in thread Thread-2:
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m Traceback (most recent call last):
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/threading.py", line 926, in _bootstrap_inner
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     self.run()
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 281, in run
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     raise e
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 262, in run
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     self._entrypoint()
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 331, in entrypoint
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     self._status_reporter.get_checkpoint())
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/ray/util/tracing/tracing_helper.py", line 451, in _resume_span
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/ray/tune/function_runner.py", line 597, in _trainable_func
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk0/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000001/environment/lib/python3.7/site-packages/bigdl/orca/automl/search/ray_tune/ray_tune_search_engine.py", line 352, in train_func
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/bigdl/orca/automl/xgboost/XGBoost.py", line 158, in fit_eval
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     self.model.fit(x, y, eval_set=eval_set, eval_metric=metric_name)
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 575, in inner_f
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     return f(**kwargs)
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 1397, in fit
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     enable_categorical=self.enable_categorical,
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 457, in _wrap_evaluation_matrices
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     enable_categorical=enable_categorical,
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/sklearn.py", line 1396, in <lambda>
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs),
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 575, in inner_f
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     return f(**kwargs)
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 692, in __init__
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     enable_categorical=enable_categorical,
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/data.py", line 881, in dispatch_data_backend
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     feature_types)
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/data.py", line 187, in _from_numpy_array
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     ctypes.byref(handle),
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m   File "/disk2/yarn/nm/usercache/kai/appcache/application_1668477395550_1326/container_1668477395550_1326_01_000004/environment/lib/python3.7/site-packages/xgboost/core.py", line 246, in _check_call
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m     raise XGBoostError(py_str(_LIB.XGBGetLastError()))
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m xgboost.core.XGBoostError: std::bad_alloc
�[2m�[36m(ImplicitFunc pid=18420, ip=172.16.0.135)�[0m

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions