-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance bug with engine = "lightgbm"
#94
Comments
Not sure if related, but I've noticed unexpectedly long tuning times with lightgbm as well, even with numeric variables only. About 10x longer than xgboost for the reprex below. library(tidyverse)
library(tidymodels)
library(bonsai)
library(future)
options(tidymodels.dark = TRUE) #lighter colors.
moddata <-
concrete |>
summarise(compressive_strength = mean(compressive_strength), .by = c(cement:age))
#initial split
set.seed(432)
split <- initial_split(moddata)
#recipe
rec <-
recipe(compressive_strength ~ ., data = training(split)) |>
step_dummy(all_nominal_predictors(), one_hot = TRUE)
#model specs
lgb_spec <-
boost_tree(tree_depth = tune(), learn_rate = tune(), loss_reduction = tune(),
mtry = tune(), min_n = tune(), sample_size = tune(), trees = tune(), stop_iter = tune()) %>%
set_engine("lightgbm") %>%
set_mode("regression")
xgb_spec <-
boost_tree(tree_depth = tune(), learn_rate = tune(), loss_reduction = tune(),
mtry = tune(), min_n = tune(), sample_size = tune(), trees = tune(), stop_iter = tune()) %>%
set_engine("xgboost") %>%
set_mode("regression")
#worflowset
wfset <-
workflow_set(
preproc = list(rec = rec),
models = list(
lGBM = lgb_spec,
xgb = xgb_spec
)
)
#resamples
set.seed(1265)
folds <- vfold_cv(training(split), v = 5, repeats = 1)
#set parallel
plan(multisession, workers = 5)
# #ctrl
grid_ctrl <-
control_grid(
save_pred = FALSE,
parallel_over = "resamples",
pkgs = NULL,
save_workflow = FALSE
)
#fit
fit_results <-
wfset |>
workflow_map("tune_grid",
seed = 1563,
resamples = folds,
grid = 2,
control = grid_ctrl,
verbose = TRUE)
#> i 1 of 2 tuning: rec_lGBM
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> v 1 of 2 tuning: rec_lGBM (32.6s)
#> i 2 of 2 tuning: rec_xgb
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> v 2 of 2 tuning: rec_xgb (3.4s) Created on 2024-12-07 with reprex v2.1.1 |
FWIW, I've notice that during parallel tuning lightGBM, each of the 5 R session processes uses ~ 7-9% of total CPU capacity, as indicated in the task manager of the 20 core/ 40 threads Windows workstation. This is unusual, as they otherwise max at 3% for other heavy computation stuff. Including tuning the xgb and other models. |
Noticed while working on emlwr the other day that
bonsai::train_lightgbm()
is quite a bit slower thanlightgbm::lgb.train()
, probably due to the handling of categorical variables / conversion tolgb.Dataset
. Observed withemlwr:::simulate_classification()
—@EmilHvitfeldt also noted a slowdown from a user with a similar-appearing dataset last week.The text was updated successfully, but these errors were encountered: