Update - xgboost to handle missing values #480

wrigleyDan · 2023-12-14T13:07:39Z

This PR is an updated version of #452

Work was done by @lechipatrick, improvement spotted by @nathancday, improvement reviewed by @styrmis

With this change XGBoost can handle missing feature values in NaiveAdditiveDecisionTree.

Score missing

nits

styrmis · 2024-01-11T10:04:31Z

@wrigleyDan Thanks again for getting this merged—do we know when it might appear in a release? It looks like the last release was a few days before this was merged.

wrigleyDan · 2024-01-17T10:51:58Z

@styrmis I wanted to release this together with upgrading the plugin to ES 8.11.x.
The latter proves to be less straight forward than other upgrades but we'll hopefully resolve that soon. If not we can cut a release for the latest improvements to not make that too dependent on the upgrade.

styrmis · 2024-01-22T11:12:34Z

Thanks @wrigleyDan. We have been looking again at this issue of inference parity between the plugin and XGBoost and have discovered that as the NaiveAdditiveDecisionTree is a DenseLtrRanker, it fills in missing values with 0. As such the only way currently that we can achieve parity in terms of inference between training and Elasticsearch/production is to fill in missing values in the training data with 0 also.

I've opened an issue for this.

lechipatrick and others added 10 commits January 3, 2023 11:05

debug

cd5ed69

debug

66a1b96

fix simple_tree tests

c6b758a

fix testRamSize

50f32b6

Merge pull request #1 from lechipatrick/score_missing

b174d77

Score missing

nits

d0eaad8

Merge pull request #2 from lechipatrick/nits

86ab097

nits

Remove rightNodeId as per PR comment #452 (comment)

181f718

Merge branch 'main' into xgboost_missing_values

0d4e0b6

Merge branch 'main' into xgboost_missing_values

bbd6482

wrigleyDan merged commit 39024f6 into main Dec 14, 2023
1 check passed

wrigleyDan mentioned this pull request Dec 14, 2023

xgboost to handle missing values #452

Closed

styrmis mentioned this pull request Jan 22, 2024

Implement support for missing values with XGBoost #481

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update - xgboost to handle missing values #480

Update - xgboost to handle missing values #480

wrigleyDan commented Dec 14, 2023

styrmis commented Jan 11, 2024

wrigleyDan commented Jan 17, 2024

styrmis commented Jan 22, 2024

Update - xgboost to handle missing values #480

Update - xgboost to handle missing values #480

Conversation

wrigleyDan commented Dec 14, 2023

styrmis commented Jan 11, 2024

wrigleyDan commented Jan 17, 2024

styrmis commented Jan 22, 2024