-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parameter to config and gui to set the min precursors required for update #460
base: main
Are you sure you want to change the base?
Conversation
…rs required to activate logreg clf
logger.info("=== Starting training of TwoStepClassifier ===") | ||
|
||
df = self._preprocess_data(df, x_cols) | ||
best_result = None | ||
df_train = df[df["rank"] < self._train_on_top_n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just from the name, I would assume _train_on_top_n
is a boolean, but then this line would make no sense .. maybe find a better name?
df_train = df[df["rank"] < self._train_on_top_n] | ||
df_predict = df | ||
# train and apply NN classifier | ||
self.second_classifier.epochs = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deliberately hardcoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the value was chosen somewhat arbitrarily. The reason I set it to 10 was to avoid the error where BinaryClassifierLegacyNewBatching.fit()
crashes in model_selection.train_test_split(x, y, test_size=self.test_size)
when there aren’t enough samples in x for splitting. 10 worked for me, but I agree it’s not ideal and as I said, chosen a bot by random. Do you have a suggestion on what do do instead maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least, make it either a module-wide constant or create a new method parameter and set it as default (then is is more obvious that there is a knob to tune)..
if it makes sense to have the user tune it -> config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops sorry, I mixed something up here, and was talking about something else. Yes, this one is deliberately set to 10, but I will add a #TODO
to line 126 where we set it = 50
if scale_by_target_decoy_ratio: | ||
n_targets = (df["decoy"] == 0).sum() | ||
n_decoys = (df["decoy"] == 1).sum() | ||
scaling_factor = round(n_targets / n_decoys, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid raising ZeroDivisionError
here (either by catching it or by adding a small epsilon to the denominator (not sure how the latter will affect the isfinite
check though)
alphadia/workflow/manager.py
Outdated
torch.load(os.path.join(path, file), weights_only=False) | ||
) | ||
self.classifier_store[classifier_hash].append(classifier) | ||
for file in os.listdir(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.stuartellis.name/articles/python-modern-practices/#use-osscandir-instead-of-oslistdir
Would not have expected reading that article yesterday would come in handy so quickly ;-)
Adding the
two_step_classifier_min_precursors_for_update
parameter to config and gui (experimental)