Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] skip empty bins when calculating cnt_in_bin in BinMapper::FindBin (fix #4301) #4325

Merged
merged 2 commits into from
Jun 3, 2021

Conversation

shiyu1994
Copy link
Collaborator

As described in #4301 (comment), this is to fix #4301 by skipping the empty bins when calculating cnt_in_bin in BinMapper::FindBin. So that we can correctly decides the most_freq_bin_.

@jameslamb
Copy link
Collaborator

@shiyu1994 sorry for the inconvenience, could you please merge master into this? That should get you the fix from #4326, which I think should avoid timeout issues from QEMU builds.

@jameslamb jameslamb added the fix label May 27, 2021
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you very much!

I built this branch locally and ran a modified version of the reproducible example from #4301 (comment) 10 times tonight.

import zipfile
from io import BytesIO

import lightgbm
import pandas as pd
import numpy as np
import requests

from sklearn.metrics import mean_absolute_error

data_url = "https://github.com/microsoft/LightGBM/files/6508547/weird.zip"

zipdata = BytesIO()
zipdata.write(requests.get(data_url, headers={"Accept": "application/octet-stream"}).content)
zip_contents = zipfile.ZipFile(zipdata)
data_file = zip_contents.open("weird.pkl")

test = pd.read_pickle(data_file)

bst = lightgbm.LGBMRegressor(verbose=1, num_iterations=5000).fit(test.drop(columns=["y"]), test["y"])

mae = mean_absolute_error(bst.predict(test.drop(columns=["y"])), test["y"])

print(f"target mean: {np.mean(test['y'])}")
print(f"MAE: {mae}")

Training succeeded every time, and increasing num_iterations led to a better fit to the training data. Given that and the fact that all CI jobs are passing, I'm confident in this fix.

@jameslamb jameslamb merged commit 3dd4a3f into microsoft:master Jun 3, 2021
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Running fit method on LGBMRegressor kills Jupyter Kernel
2 participants