Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during LightGBM run #2

Open
pjsgsy opened this issue May 25, 2024 · 3 comments
Open

Error during LightGBM run #2

pjsgsy opened this issue May 25, 2024 · 3 comments

Comments

@pjsgsy
Copy link

pjsgsy commented May 25, 2024

Hi,

I know this is an old project and perhaps gone away, but it was exactly what I was looking for! Downloaded the source and built it OK. Added to project and attempting to train. I added a List<List> as the training data. At the point of .Train(trainigndata), an exception occurs that states I don;t have enough rows, yet when I look at the list of the list<doubles@, they are there. List containing 30k List each 39 doubles in length.

Exception: Cannot construct Dataset since there are not useful features. It should be at least two unique rows. If the num_row (num_data) is small, you can set min_data=1 and min_data_in_bin=1 to fix this. Otherwise please make sure you are using the right dataset.

The List is definitely 30k items, and each item is a List of 39 doubles.

Am I being a complete luddite? A simple code example of usage I could not find...

Any clues?

Thanks!

@pjsgsy
Copy link
Author

pjsgsy commented May 25, 2024

OK - I figured this out. I changed my temp folder to one I could more easily monitor and saw the .csv was in fact being written, but with all the same record. A bug on my side! So, that issue was resolved. Thank you! For anyone else who is even more of a newbie than me!

lightGbmFF = new LightGbm(false,@"b:\temp");
private List<double[]> lgbmFeatures = new  List<double[]>();
double[] lgbmFeature =  new double[40];
lgbmFeature[0] = (double)dir;
for (uint k=0; k<numFeatures; k++)
lgbmFeature[k+1] = FEATURE[historicalBar][k];	// data (features)
lgbmFeatures.Add(lgbmFeature);
if (lgbmFeatures.Any() )
{
	Print("Training LightGBM with "+lgbmFeatures.Count +" rows");
	lightGbmFF.Train(lgbmFeatures);
	lgbmFeatures = null;
}
lightGbmFF.Dispose();

once again - Thanks for sharing this. Not sure even now, after all these years, if there is anything else for .net 4.8 that allows lightGBM usage.

@pjsgsy pjsgsy closed this as completed May 25, 2024
@pjsgsy
Copy link
Author

pjsgsy commented May 25, 2024

Further - I can it helps anyone else, given I could not find any code examples for usage (though I guess it is well documented enough code).

For multiclass classification, you will need to pass some parameters to .train, like this

Parameters Param = Parameters.DefaultForMulticlassClassification.Clone();
Param.AddOrReplace(new LightGbmDotNet.Parameter("num_classes", "3"));
lightGbmFF.Train(lgbmFeatures, Param);

There is a full list of parameters here

https://lightgbm.readthedocs.io/en/latest/Parameters.html

After a quick look at the code, the same defaults are returned for all types I think, so, you will need to set them. I am sure all this is clear to the author, but, for me, stumbling on this, it was not...

Unfortunately for me, once I have got this far, predictions now fail with an error 'Input string was not in a correct format.', despite the fact it is the same as when it was trying to do binary classification and working OK.

I LightGBM_predict_result.txt file, the correct 3 classes and values are there, such as

0.54171384829490787 0.23963322102351683 0.21865293068157535

So, not sure why this code inside LightGbm.cs fails. If I find a fix, I will share it.

@pjsgsy pjsgsy reopened this May 25, 2024
@pjsgsy
Copy link
Author

pjsgsy commented May 25, 2024

OK - Got it. Seem the code here might not actually support multiclass classification! It is trying to parse the returned (correct) result , but does so with

double.Parse(l, englishCulture)

Yet, the string it is trying to parse is multiclass and reads

"0.20059427917564951\t0.76653742731652563\t0.032868293507824817"

So, it would seem this needs fixing. I can probably do that. If this is not a dead project, please let me know and I will share the code if of interest. if not, perhaps I will fork this for my use.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant