Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter to config and gui to set the min precursors required for update #460

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

anna-charlotte
Copy link
Contributor

Adding the two_step_classifier_min_precursors_for_update parameter to config and gui (experimental)

@anna-charlotte anna-charlotte changed the base branch from main to fix-update-of-first-classifier January 29, 2025 14:56
@anna-charlotte anna-charlotte marked this pull request as draft January 29, 2025 14:58
Base automatically changed from fix-update-of-first-classifier to main January 30, 2025 21:45
@anna-charlotte anna-charlotte marked this pull request as ready for review January 31, 2025 07:55
@anna-charlotte anna-charlotte marked this pull request as draft February 5, 2025 14:38
logger.info("=== Starting training of TwoStepClassifier ===")

df = self._preprocess_data(df, x_cols)
best_result = None
df_train = df[df["rank"] < self._train_on_top_n]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just from the name, I would assume _train_on_top_n is a boolean, but then this line would make no sense .. maybe find a better name?

df_train = df[df["rank"] < self._train_on_top_n]
df_predict = df
# train and apply NN classifier
self.second_classifier.epochs = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deliberately hardcoded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the value was chosen somewhat arbitrarily. The reason I set it to 10 was to avoid the error where BinaryClassifierLegacyNewBatching.fit() crashes in model_selection.train_test_split(x, y, test_size=self.test_size) when there aren’t enough samples in x for splitting. 10 worked for me, but I agree it’s not ideal and as I said, chosen a bot by random. Do you have a suggestion on what do do instead maybe?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least, make it either a module-wide constant or create a new method parameter and set it as default (then is is more obvious that there is a knob to tune)..
if it makes sense to have the user tune it -> config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops sorry, I mixed something up here, and was talking about something else. Yes, this one is deliberately set to 10, but I will add a #TODO to line 126 where we set it = 50

if scale_by_target_decoy_ratio:
n_targets = (df["decoy"] == 0).sum()
n_decoys = (df["decoy"] == 1).sum()
scaling_factor = round(n_targets / n_decoys, 3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid raising ZeroDivisionError here (either by catching it or by adding a small epsilon to the denominator (not sure how the latter will affect the isfinite check though)

torch.load(os.path.join(path, file), weights_only=False)
)
self.classifier_store[classifier_hash].append(classifier)
for file in os.listdir(path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.stuartellis.name/articles/python-modern-practices/#use-osscandir-instead-of-oslistdir
Would not have expected reading that article yesterday would come in handy so quickly ;-)

@anna-charlotte anna-charlotte marked this pull request as ready for review February 28, 2025 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants