[fix] skip empty bins when calculating cnt_in_bin in BinMapper::FindBin (fix #4301) #4325

shiyu1994 · 2021-05-26T09:45:32Z

As described in #4301 (comment), this is to fix #4301 by skipping the empty bins when calculating cnt_in_bin in BinMapper::FindBin. So that we can correctly decides the most_freq_bin_.

…icrosoft#4301)

jameslamb · 2021-05-27T16:38:09Z

@shiyu1994 sorry for the inconvenience, could you please merge master into this? That should get you the fix from #4326, which I think should avoid timeout issues from QEMU builds.

…ix-4301

jameslamb

thank you very much!

I built this branch locally and ran a modified version of the reproducible example from #4301 (comment) 10 times tonight.

import zipfile
from io import BytesIO

import lightgbm
import pandas as pd
import numpy as np
import requests

from sklearn.metrics import mean_absolute_error

data_url = "https://github.com/microsoft/LightGBM/files/6508547/weird.zip"

zipdata = BytesIO()
zipdata.write(requests.get(data_url, headers={"Accept": "application/octet-stream"}).content)
zip_contents = zipfile.ZipFile(zipdata)
data_file = zip_contents.open("weird.pkl")

test = pd.read_pickle(data_file)

bst = lightgbm.LGBMRegressor(verbose=1, num_iterations=5000).fit(test.drop(columns=["y"]), test["y"])

mae = mean_absolute_error(bst.predict(test.drop(columns=["y"])), test["y"])

print(f"target mean: {np.mean(test['y'])}")
print(f"MAE: {mae}")

Training succeeded every time, and increasing num_iterations led to a better fit to the training data. Given that and the fact that all CI jobs are passing, I'm confident in this fix.

github-actions · 2023-08-23T20:50:52Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

skip empty bin when calculating cnt_in_bin in BinMapper::FindBin (fix m…

300d041

…icrosoft#4301)

shiyu1994 requested review from btrotta, chivee and guolinke as code owners May 26, 2021 09:45

shiyu1994 requested review from jameslamb, StrikerRUS and wxchan May 26, 2021 09:46

jameslamb added the fix label May 27, 2021

Merge branch 'master' of https://github.com/microsoft/LightGBM into f…

ea9cea6

…ix-4301

jameslamb approved these changes Jun 3, 2021

View reviewed changes

jameslamb merged commit 3dd4a3f into microsoft:master Jun 3, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] skip empty bins when calculating cnt_in_bin in BinMapper::FindBin (fix #4301) #4325

[fix] skip empty bins when calculating cnt_in_bin in BinMapper::FindBin (fix #4301) #4325

shiyu1994 commented May 26, 2021

jameslamb commented May 27, 2021

jameslamb left a comment

github-actions bot commented Aug 23, 2023

[fix] skip empty bins when calculating cnt_in_bin in BinMapper::FindBin (fix #4301) #4325

[fix] skip empty bins when calculating cnt_in_bin in BinMapper::FindBin (fix #4301) #4325

Conversation

shiyu1994 commented May 26, 2021

jameslamb commented May 27, 2021

jameslamb left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 23, 2023