start: Delay pull secret on disk check to end #4776

praveenkumar · 2025-05-30T15:54:46Z

In current scenario MCO is patched with user provided pull secret and just after, we check if pull secret is part of disk which takes ~1 min since MCD make that change and the report to MCO. In this PR we are delaying this pull secret check to end because instead of blocking it for ~1 min better to execute other part and at the end check if pull secret is part of disk image.

With this PR (crc start time, 6 runs)

real	4m9.247s
user	0m0.557s
sys	0m0.165s

real	4m0.455s
user	0m0.619s
sys	0m0.168s

real	4m5.962s
user	0m0.445s
sys	0m0.154s

real	3m59.594s
user	0m0.661s
sys	0m0.179s

real	4m3.958s
user	0m0.563s
sys	0m0.177s

real	4m28.806s
user	0m0.460s
sys	0m0.171s

Without this PR

real	5m7.235s
user	0m0.797s
sys	0m0.181s

real	4m28.741s
user	0m0.891s
sys	0m0.195s

real	6m6.815s
user	0m0.747s
sys	0m0.194s

real	5m1.733s
user	0m0.395s
sys	0m0.199s

real	4m30.551s
user	0m0.466s
sys	0m0.173s

real	4m31.067s
user	0m0.673s
sys	0m0.183s

Description

Fixes: #N

Relates to: #N, PR #N, ...

Type of change

Bug fix (non-breaking change which fixes an issue)
Feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change
Chore (non-breaking change which doesn't affect codebase;
test, version modification, documentation, etc.)

Proposed changes

Testing

Contribution Checklist

I Keep It Small and Simple: The smaller the PR is, the easier it is to review and have it merged
I have performed a self-review of my code
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Which platform have you tested the code changes on?
- Linux
- Windows
- MacOS

Summary by Sourcery

Enhancements:

Move pull secret disk presence verification to the end of the start process to prevent early blocking and improve performance

Summary by CodeRabbit

Bug Fixes
- Improved reliability when accessing the pull secret by ensuring connectivity checks are performed before each attempt.
- Adjusted the sequence of operations during cluster startup to wait for cluster stabilization before verifying the presence of the pull secret.

sourcery-ai · 2025-05-30T15:54:51Z

Reviewer's Guide

Defers the time-consuming pull secret disk check by moving its invocation from immediately after SSH key update to the end of the Start workflow, thus allowing other initialization steps to run first and reducing overall startup blocking time.

Sequence diagram for the updated Start process with delayed pull secret check

sequenceDiagram
    actor User
    participant client as client.Start()
    participant cluster as Cluster Operations
    participant sshRunner as SSH Runner

    User->>client: Initiate Start(startConfig)
    activate client

    client->>cluster: UpdateHostMCDToken(...)
    activate cluster
    cluster-->>client: Token Updated
    deactivate cluster

    client->>cluster: AddSSHKeyToMachine(...)
    activate cluster
    cluster-->>client: SSH Key Added
    deactivate cluster

    client->>cluster: UpdateUserPasswords(...)
    activate cluster
    cluster-->>client: Passwords Updated
    deactivate cluster

    client->>cluster: EnsurePersistentVolume(...)
    activate cluster
    cluster-->>client: Volume Ensured
    deactivate cluster

    client->>cluster: WaitForProxyConfig(...)
    activate cluster
    cluster-->>client: Proxy Config Ready
    deactivate cluster

    client->>cluster: WaitForClusterToBeReachable(...)
    activate cluster
    cluster-->>client: Cluster Reachable
    deactivate cluster

    client->>cluster: WaitForPullSecretPresentOnInstanceDisk(ctx, sshRunner)
    activate cluster
    cluster->>sshRunner: Check pull secret on disk
    activate sshRunner
    sshRunner-->>cluster: Disk check status
    deactivate sshRunner
    cluster-->>client: Pull Secret Present
    deactivate cluster

    client->>client: waitForProxyPropagation(...)

    client-->>User: Start Process Complete
    deactivate client

Class diagram for the modified client type

classDiagram
  class client {
    +Start(Context, StartConfig)
  }

File-Level Changes

Change	Details	Files
Delay pull secret disk presence check to the end of the Start sequence	Removed the early call to WaitForPullSecretPresentOnInstanceDisk before password updates Inserted the pull secret check after UpdateUserPasswords and readiness log handling Adjusted error wrapping to reflect the new call site	`pkg/crc/machine/start.go`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

praveenkumar · 2025-05-30T15:55:41Z

/retest

sourcery-ai

Hey @praveenkumar - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-05-30T15:57:29Z

pkg/crc/machine/start.go

-	if err := cluster.WaitForPullSecretPresentOnInstanceDisk(ctx, sshRunner); err != nil {
-		return nil, errors.Wrap(err, "Failed to update pull secret on the disk")
-	}
-
 	if err := cluster.UpdateUserPasswords(ctx, ocConfig, startConfig.KubeAdminPassword, startConfig.DeveloperPassword); err != nil {
 		return nil, errors.Wrap(err, "Failed to update kubeadmin user password")


suggestion: Refine error message to reflect waiting operation

The error message should indicate a failure in checking for the pull secret's presence, not updating it. Suggest: Failed initial pull secret presence check.

openshift-ci · 2025-06-05T07:56:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anjannath

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [anjannath]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cfergeau · 2025-06-05T09:15:47Z

pkg/crc/machine/start.go

@@ -614,6 +610,10 @@ func (client *client) Start(ctx context.Context, startConfig types.StartConfig)
 		logging.Warnf("Cluster is not ready: %v", err)
 	}

+	if err := cluster.WaitForPullSecretPresentOnInstanceDisk(ctx, sshRunner); err != nil {


can cluster.StartMonitoring() and cluster.WaitForClusterStable succeed if the pull secret is not written yet to disk?

We don't have monitoring related images cached in the bundle so when user want to use monitoring operator then they have to wait for images and images can will be downloaded using the pull secret but in our case we don't dictate when this is available on the disk since MCO is responsible for it. Now assume when monitoring operator start and pull secret is not yet available on disk then it will fail and again retry but eventually succeed as soon as images are pulled.

@cfergeau any thought on that?

If there is some code which needs the pull secret to be on disk to work, and we have a "wait until it works" function for this code, then I’d prefer if cluster.WaitForPullSecretPresentOnInstanceDisk was called before it.
Otherwise the "wait until it works" function will also act as a hidden cluster.WaitForPullSecretPresentOnInstanceDisk, and cluster.WaitForPullSecretPresentOnInstanceDisk will almost be a no-op.

@cfergeau We don't have any code that directly depends on the pull secret being present on disk. Instead, we're executing a set of oc commands, some action (like monitoring or marketplace) involve pulling container images. These operations require the pull secret, and if it's not available, they can result in image pull errors. However, these errors are non-blocking and are retried, so they don't cause a complete failure.

Our current goal in this PR is to optimize startup time by checking for the presence of the pull secret on disk only at the end of the process. In fact, we could go a step further and consider removing this check entirely. Once the pull secret is patched, it's the responsibility of the Machine Config Operator (MCO) to propagate it to the disk. Our check is effectively a redundant safeguard as of now.

In current scenario MCO is patched with user provided pull secret and just after, we check if pull secret is part of disk which takes ~1 min since MCD make that change and the report to MCO. In this PR we are delaying this pull secret check to end because instead of blocking it for ~1 min better to execute other part and at the end check if pull secret is part of disk image. With this PR (crc start time, 6 runs) ``` real 4m9.247s user 0m0.557s sys 0m0.165s real 4m0.455s user 0m0.619s sys 0m0.168s real 4m5.962s user 0m0.445s sys 0m0.154s real 3m59.594s user 0m0.661s sys 0m0.179s real 4m3.958s user 0m0.563s sys 0m0.177s real 4m28.806s user 0m0.460s sys 0m0.171s ``` Without this PR ``` real 5m7.235s user 0m0.797s sys 0m0.181s real 4m28.741s user 0m0.891s sys 0m0.195s real 6m6.815s user 0m0.747s sys 0m0.194s real 5m1.733s user 0m0.395s sys 0m0.199s real 4m30.551s user 0m0.466s sys 0m0.173s real 4m31.067s user 0m0.673s sys 0m0.183s ```

openshift-ci · 2025-06-11T04:47:19Z

New changes are detected. LGTM label has been removed.

During CI failure it is observe sometime following error happen when checking the pull secret present on the disk. This PR make sure that ssh connectivity is present before checking pull-secret. ``` DEBU SSH command results: err: ssh: unexpected packet in response to channel open: <nil>, output: DEBU error: Temporary error: ssh command error: ```

praveenkumar · 2025-06-11T15:46:28Z

/retest

praveenkumar · 2025-06-24T16:06:00Z

@coderabbitai review

coderabbitai · 2025-06-24T16:06:07Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-06-24T16:06:10Z

Walkthrough

A connectivity check was added to the process of waiting for the pull secret file on the instance disk, ensuring SSH access before each retry. Additionally, the sequence of operations was adjusted so that waiting for the pull secret occurs after the cluster stabilization step rather than before it.

Changes

File(s)	Change Summary
pkg/crc/cluster/cluster.go	Added SSH connectivity check before each retry in `WaitForPullSecretPresentOnInstanceDisk`.
pkg/crc/machine/start.go	Moved the call to wait for pull secret presence to after the cluster stabilization step.

Poem

In the warren of code, a secret we seek,
But first, we must check if the tunnel’s not weak.
Wait for the cluster, let it grow strong,
Only then do we search where secrets belong.
With patience and pings, the rabbit prevails—
Secure in its burrow, where connectivity never fails! 🐇

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (1.64.8)

Error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2
Failed executing command with error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

praveenkumar · 2025-06-24T16:08:17Z

@coderabbitai full review

coderabbitai · 2025-06-24T16:08:24Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

pkg/crc/cluster/cluster.go (1)
450-452: Consider making connectivity check retryable.

The connectivity check is a good defensive measure, but returning immediately on connectivity failure may be too strict. If SSH connectivity is temporarily lost (network hiccup, VM resource contention), the entire function fails without retry.

Consider wrapping the connectivity check in a RetriableError to allow the retry mechanism to handle temporary connectivity issues:
 		if err := sshRunner.WaitForConnectivity(ctx, 30*time.Second); err != nil {
-			return err
+			return &errors.RetriableError{Err: err}
 		}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e686f31 and d4c2438.

📒 Files selected for processing (2)

pkg/crc/cluster/cluster.go (1 hunks)
pkg/crc/machine/start.go (1 hunks)

🔇 Additional comments (1)

pkg/crc/machine/start.go (1)

613-616: LGTM! Sequence optimization improves start time.

Moving the pull secret disk check to after cluster stabilization is a good optimization. This allows other startup operations to proceed while the Machine Config Operator propagates the pull secret, reducing the overall blocking time as described in the PR objectives.

openshift-ci bot requested review from cfergeau and lstocchi May 30, 2025 15:54

sourcery-ai bot reviewed May 30, 2025

View reviewed changes

praveenkumar force-pushed the delay_pull_secret_check branch from 0423b15 to 1e97ca1 Compare May 30, 2025 15:59

anjannath approved these changes Jun 5, 2025

View reviewed changes

openshift-ci bot assigned anjannath Jun 5, 2025

openshift-ci bot added the lgtm label Jun 5, 2025

openshift-ci bot added the approved label Jun 5, 2025

cfergeau reviewed Jun 5, 2025

View reviewed changes

praveenkumar force-pushed the delay_pull_secret_check branch from 1e97ca1 to 24605d0 Compare June 11, 2025 04:47

openshift-ci bot removed the lgtm label Jun 11, 2025

coderabbitai bot reviewed Jun 24, 2025

View reviewed changes

start: Delay pull secret on disk check to end #4776

Are you sure you want to change the base?

start: Delay pull secret on disk check to end #4776

Uh oh!

Conversation

praveenkumar commented May 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Proposed changes

Testing

Contribution Checklist

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

sourcery-ai bot commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for the updated Start process with delayed pull secret check

Class diagram for the modified client type

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

praveenkumar commented May 30, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot May 30, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jun 5, 2025

Uh oh!

cfergeau Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

praveenkumar Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

praveenkumar Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

cfergeau Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

praveenkumar Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jun 11, 2025

Uh oh!

praveenkumar commented Jun 11, 2025

Uh oh!

praveenkumar commented Jun 24, 2025

Uh oh!

coderabbitai bot commented Jun 24, 2025

Uh oh!

coderabbitai bot commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

praveenkumar commented Jun 24, 2025

Uh oh!

coderabbitai bot commented Jun 24, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

praveenkumar commented May 30, 2025 •

edited by coderabbitai bot

Loading

sourcery-ai bot commented May 30, 2025 •

edited

Loading

coderabbitai bot commented Jun 24, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)