-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBoost parameter error (colsample_bytree=1) #449
Comments
Hmm, the warning message was indeed reproduced in my environment. But it is very weird that this I also checked the first 26 pipelines in TPOT optimization process. Only two pipelines below used XGBClassifier. I tested both of them and both worked without warning message. Very weird.
|
I should have been a bit more explicit for anyone else following along. In addition to the message in the console, I also receive the windows message that "python.exe has stopped working" and then the program crashes. Also, I'm not sure if "Pipeline" is a 1-to-1 with "generation" (haven't dug into too much of the source yet), but it did require me to set the "generations" parameter of the I might be misunderstanding the relationships between pipelines and generations though. In the script above for example, if I set the Note: The tpot object was able to fit with no errors when Going to try and investigate this more tonight. |
Thank you for these detailed information for this issue. I don't think the issue was related to |
Absolutely! Thanks for your response! That reminds me, it's also probably worth mentioning that I ran into a few Thank you! |
I found the reason of the issue. It is due to pipeline 32 in generation 0 (see below) when using the demo in the issue. The first step is feature selection. But sadly no feature passed the threshold in the first step so that no feature is available for For solving this issue, I will submit a PR to catch this error message to prevent TPOT from crashing.
For reproducing the error message without running TPOT, please try the codes below:
|
Somehow, the error message still showed up even I used the codes above to catch XGBoostError. But program did not crash after runing this bad pipeine 2000 times. |
From this part of source code in xgboost, it seems that the error message is printed out by |
Ah that makes sense, good catch! Would it be acceptable to just suppress any pipelines that don't meet certain conditions (in this case, not passing in any features due to no features meeting the feature-selection threshold) so they don't get scored / crossed over with other pipelines? I see what you're saying though, it would probably be best to try to use XGBoosts built-in error checking from a maintainability perspective, right? |
Thank you for these good ideas. It is hard to tell whether the feature-selection step would remove all features before running the pipeline, and it also depends on data. We will refine parameters in selectors (#423) to prevent this issue. In my codes above, I tried XGBoosts built-in error |
Do we know why this issue occurs ? it will be helpful to know why "colsample_bytree=1 is too small that no |
The reason is that feature-selection step in a pipeline can exclude all features before running xgboost. We need better control on feature numbers within pipeline. |
I have the same issue.
|
I've seen some traffic on these issues regarding potentially getting rid of xgboost altogether due to dependency troubles, so if that is the case then this isn't relevant.
I am receiving the following error message:
I know that the colsample_bytree parameter should be what proportion of the features you're allowing each tree to randomly sample from in order to build itself. So a colsample_bytree=1 should be telling each tree to sample from 100% of the columns/features when building a tree. (Please correct me if I'm wrong on that!)
xgboost colsample_bytree = subsample ratio of columns when constructing each tree.
This has also been previously raised as an issue on xgboost's github repo, but that issue was closed without really any explanation of what the user was doing wrong.
My guess is that this would be an error with what parameters are being passed into XGBoost and not necessarily an xgboost issue.
Context of the issue
My environment:
Process to reproduce the issue
This is my simple script to reproduce the error in my environment with random data. This error doesn't tend to occur when my
generations
andpopulation_size
are low (around 10-15 each). I have experienced this issue with generation/population_size as low as 32 (with this same script below). Hopefully this short script is sufficiently reproducible!I couldn't find any prior issues that addressed this specific error I keep running into, but I apologize if I may have missed one.
The text was updated successfully, but these errors were encountered: