Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dynamic hyperparameter scaling #435

Merged
merged 1 commit into from
Jan 17, 2025
Merged

add dynamic hyperparameter scaling #435

merged 1 commit into from
Jan 17, 2025

Conversation

GeorgWa
Copy link
Collaborator

@GeorgWa GeorgWa commented Jan 16, 2025

Dynamic scaling of batch size and learning rate in NN classifier.

  • Prevents FDR collapse for small libraries
  • Improves speed for large libraries
    Overrides static config of batchsize and lr

Copy link
Contributor

@anna-charlotte anna-charlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@@ -905,6 +908,42 @@ def predict_proba(self, x: np.ndarray):
return self.network(torch.Tensor(x)).detach().numpy()


def get_scaled_training_params(df, base_lr=0.001, max_batch=1024, min_batch=64):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that the max batch_size is not closer to 5000 anymore, as we used it previously?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kinda sneaked in here :D
I saw improved performance going down from 5000 to 1000

@anna-charlotte anna-charlotte mentioned this pull request Jan 17, 2025
4 tasks
Base automatically changed from lib-fix to main January 17, 2025 12:28
@GeorgWa GeorgWa merged commit 5f24fa3 into main Jan 17, 2025
5 checks passed
@GeorgWa GeorgWa deleted the dynamic-hyperparameter branch January 17, 2025 12:33
@@ -905,6 +908,42 @@ def predict_proba(self, x: np.ndarray):
return self.network(torch.Tensor(x)).detach().numpy()


def get_scaled_training_params(df, base_lr=0.001, max_batch=1024, min_batch=64):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify the batch size calculation by combining min/max logic into a single line using max() and min() functions instead of np.clip(). This makes the logic more explicit and easier to follow.

def get_scaled_training_params(df, base_lr=0.001, max_batch=1024, min_batch=64):

    n_samples = len(df)

    # For >= 1M samples, use max batch size
    if n_samples >= 1_000_000:
        return max_batch, base_lr

    # Calculate scaled batch size (linear scaling between min and max)
    batch_size = int(max(min(max_batch * n_samples / 1_000_000, max_batch), min_batch))

    # Scale learning rate using square root relationship
    learning_rate = base_lr * np.sqrt(batch_size / max_batch)

    return batch_size, learning_rate

@@ -1066,6 +1110,14 @@ def fit(self, x: np.ndarray, y: np.ndarray):
Target values of shape (n_samples,) or (n_samples, n_classes).

"""
if self.experimental_hyperparameter_tuning:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use debug level logging instead of info for hyperparameter tuning details since this is diagnostic information. Also simplified the log message formatting.

if self.experimental_hyperparameter_tuning:
            self.batch_size, self.learning_rate = get_scaled_training_params(x)
            logger.debug(
                f"Using scaled hyperparameters - samples: {len(x):,}, batch_size: {self.batch_size:,}, lr: {self.learning_rate:.2e}"
            )

@@ -95,7 +95,11 @@
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove explicit batch_size and learning_rate since they will be set automatically by experimental_hyperparameter_tuning. Add clarifying comment.

classifier_base = fdrx.BinaryClassifierLegacyNewBatching(
    test_size=0.001,
    epochs=10,
    experimental_hyperparameter_tuning=True, # Batch size and learning rate will be scaled automatically
)

@@ -45,3 +47,31 @@ def test_target_decoy_fdr(mock_show):
assert all([col in df.columns for col in ["decoy_proba", "qval", "pep"]])
assert np.all(df[["decoy_proba", "qval", "pep"]].values >= 0)
assert np.all(df[["decoy_proba", "qval", "pep"]].values <= 1)


@pytest.mark.parametrize(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve test case clarity by: 1) Using power operator (**) instead of np.sqrt() for more readable learning rate calculations, 2) Adding descriptive comments for each test case, 3) Aligning values for better readability

@pytest.mark.parametrize(
    "n_samples,expected_batch,expected_lr",
    [
        (1_000_000, 1024, 0.001),          # Base case
        (2_000_000, 1024, 0.001),          # Above max samples
        (500_000, 512, 0.001 * (512/1024)**0.5),  # 50% scaling
        (250_000, 256, 0.001 * (256/1024)**0.5),  # 25% scaling
        (50_000, 64, 0.001 * (64/1024)**0.5),     # Min batch size
        (1_000, 64, 0.001 * (64/1024)**0.5),     # Below min batch size
    ],
)

Copy link

Number of tokens: input_tokens=28412 output_tokens=1087 MAX_NUM_OUTPUT_TOKENS=4096

Copy link
Collaborator

@vbrennsteiner vbrennsteiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -905,6 +908,42 @@ def predict_proba(self, x: np.ndarray):
return self.network(torch.Tensor(x)).detach().numpy()


def get_scaled_training_params(df, base_lr=0.001, max_batch=1024, min_batch=64):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a small point but could we consider renaming 'max_batch' to 'max_batch_size' and 'min_batch' to 'min_batch_size'?

@@ -95,7 +95,11 @@
]

classifier_base = fdrx.BinaryClassifierLegacyNewBatching(
test_size=0.001, batch_size=5000, learning_rate=0.001, epochs=10
test_size=0.001,
batch_size=5000,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention in a previous comment that a batch size closer to 1000 is actually optimal, would it make sense to change this default?

(1_000, 64, 0.001 * np.sqrt(64 / 1024)), # Should hit min batch size
],
)
def test_get_scaled_training_params(n_samples, expected_batch, expected_lr):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same again, I would recommend 'expected_batch_size' for clarity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants