Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please provide legacy support for FeatureDatum input Tuples #10

Open
Schmed opened this issue Nov 13, 2015 · 4 comments
Open

Please provide legacy support for FeatureDatum input Tuples #10

Schmed opened this issue Nov 13, 2015 · 4 comments

Comments

@Schmed
Copy link
Member

Schmed commented Nov 13, 2015

There's legacy code out there that expects to call the classify() method with FeatureDatum Tuples, but this method now supports only TermsDatum Tuples. Why not have a legacy getFeatures() method that takes a term map used by a legacy classify() method that takes a TermsDatum, etc.?

@kkrugler
Copy link
Member

So this getFeatures() method would be part of what class?

@Schmed
Copy link
Member Author

Schmed commented Nov 13, 2015

I'm referring to the following method within com.scaleunlimited.classify.model.HashedFeaturesLibLinearModel.java:

private Feature[] getFeatures(Map<String, Integer> terms)

I'm suggesting that a second method in the same class be written with the old signature:

private Feature[] getOldFeatures(Map<String, Double> terms)

@kkrugler
Copy link
Member

HashedFeaturesLibLinearModel extends BaseLibLinearModel, which extends BaseModel<TermsDatum>. So this means the API for addTrainingTerms(TermsDatum) and classify(TermsDatum) have to stay as such. I think you're proposing adding alternative versions to those two methods in BaseLibLinearModel, which take a FeatureDatum argument.

The tricky part is that BaseLibLinearModel maintains the list of TermsDatum documents for training. So I'd either need to have a parallel array (and verify you were only using one or the other), or I could convert the FeatureDatum to a TermsDatum by say multiplying each feature weight by some large fixed constant. That would impose a constraint on the max weight you could provide, but probably OK. Thoughts?

@kkrugler
Copy link
Member

Or I guess I could convert TermDatum to FeatureDatum in addTrainingTerms, and store that in the list, and bail on ever supporting normalization that works across all training documents. Which is probably OK, as I already pulled out the TfIdfNormalizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants