Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Make remaining parameters formal arguments to xgboost() #11109

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

david-cortes
Copy link
Contributor

ref #9810

This PR adds the remaining parameters that can be passed to xgboost() as function arguments.

It selectively omits parameters that are not applicable to xgboost(), such as parameters related to learning-to-rank objectives which are not supported by this function, but I'm not entirely sure that I'm not missing any.

The docs are auto-copied from xgb.params, with some small modifications such as aliased parameters docs being re-written here, as aliases are not supported (just like in the sklearn interface for python).

I wasn't entirely sure what'd be the best way to add the parameters here, so I though of the following:

  • I left only the more descriptive aliases of parameters (e.g. "learning_rate" is accepted, but not "eta").
  • I selectively moved the parameters that I find myself changing more frequently from their defaults near the top of the list (after nrounds, before verbosity and monitoring settings), and a small subset which is also more likely to be changed to appear after verbosity-related settings but before the rest of the parameters.
  • I added all of the accepted parameters, leaving out ....

This still leaves a function signature with 50+ parameters though, so not sure if perhaps it should omit the less frequent parameters altogether and offer a ... option; or if it should simply stick to the same order of parameters as in the .rst docs. Would be ideal to hear opinions from @mayer79 and @trivialfis here.

@mayer79
Copy link
Contributor

mayer79 commented Dec 16, 2024

Awesome! I like the order of the arguments. Having dozends of arguments should be okay. H2o also lists them all: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/xgboost.html

@trivialfis
Copy link
Member

Thank you for working on the UX, is it possible to use some techniques from roxygen https://cran.r-project.org/web/packages/roxygen2/vignettes/reuse.html to reduce the amount of duplication?

@david-cortes
Copy link
Contributor Author

Thank you for working on the UX, is it possible to use some techniques from roxygen https://cran.r-project.org/web/packages/roxygen2/vignettes/reuse.html to reduce the amount of duplication?

It is reusing most of the docs from xgb.params. The ones that are copied have slight modifications, such as the objective having additional docs and a different list of what is supported; and the ones with aliases mentioning only the alias that is accepted by xgboost().

@david-cortes david-cortes changed the title [RFC] [R] Make remaining parameters formal arguments to xgboost() [R] Make remaining parameters formal arguments to xgboost() Dec 17, 2024
#' can only be used with classification objectives and vice-versa.
#'
#' Note that not all possible `objective` values supported by the core XGBoost library are allowed
#' here - for example, objectives which are a variation of another but with a different default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' here - for example, objectives which are a variation of another but with a different default
#' by the [xgboost()] function - for example, objectives which are a variation of another but with a different default

maybe mention the xgb.train()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mentioned at the beginning. I don't think [xgboost()] would be hepful here, because these are the docs for that same function.

Comment on lines +925 to +926
#' - `"survival:aft"`: Accelerated failure time model for censored survival time data.
#' See [Survival Analysis with Accelerated Failure Time](https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html) for details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is aft supported? It requires a lower and upper bound for labels due to censored data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are supported. The data needs to be passed as a Surv object, which is what most R packages use for survival regression.

Comment on lines +913 to +917
#' - `"reg:squarederror"`: regression with squared loss.
#' - `"reg:squaredlogerror"`: regression with squared log loss \eqn{\frac{1}{2}[log(pred + 1) - log(label + 1)]^2}. All input labels are required to be greater than -1. Also, see metric `rmsle` for possible issue with this objective.
#' - `"reg:pseudohubererror"`: regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
#' - `"reg:absoluteerror"`: Regression with L1 error. When tree model is used, leaf value is refreshed after tree construction. If used in distributed training, the leaf value is calculated as the mean value from all workers, which is not guaranteed to be optimal.
#' - `"reg:quantileerror"`: Quantile loss, also known as "pinball loss". See later sections for its parameter and [Quantile Regression](https://xgboost.readthedocs.io/en/latest/python/examples/quantile_regression.html#sphx-glr-python-examples-quantile-regression-py) for a worked example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refer to the [xgb.params] after the comment on what is NOT supported? It may add an additional click for the user, but managing and updating these types of documents is quite challenging from my perspective. As you have encountered, sooner or later, they rot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another reference to xgb.params and listed explicitly the ones that are not supported to be make it easier to update in the future.

@trivialfis
Copy link
Member

@david-cortes Could you please help fix the R linter errors: https://github.com/dmlc/xgboost/actions/runs/12418591191/job/34672101709?pr=11109 ?

@david-cortes
Copy link
Contributor Author

Updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants