Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the need to define a split when running predict #138

Open
drewoldag opened this issue Dec 12, 2024 · 2 comments
Open

Remove the need to define a split when running predict #138

drewoldag opened this issue Dec 12, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@drewoldag
Copy link
Collaborator

The current code expects that a split will be defined in the [predict] table in the config file. The user can then set the specified split to be 1.0 (100%) of the data in the data directory.

But this feels like an odd requirement for the user to update the [data_set] split definitions in order to run inference.

I would advocate for an approach that doesn't require the user to specify a split in the [predict] table, and just runs all the data found through the trained model.

@drewoldag drewoldag added the enhancement New feature or request label Dec 12, 2024
@aritraghsh09
Copy link
Collaborator

This will be the expected default user behavior in unsupervised learning scenarios.

In supervised scenarios, users might want to run predict only on the "test" set (i.e., data - train - validation)

I don't think there is a "right" answer here.

@drewoldag
Copy link
Collaborator Author

Proposing a solution - what is the default value for the split value in [predict] is false, which would indicate "no split, just use all the data in the data directory". And if the value is not "false" then we would expect it to be one of ["train", "validate", "test"].

I think that would suppose both use cases relatively easily:

  1. I just want to use an existing trained model with a bunch of data - no mods required to the config file.
  2. I want to test my trained model on a subset of my input data - the user would expect to have to define the subset in the config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants