Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: cli: return distinct error codes depending on copy failure #1062

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

r4f4
Copy link
Contributor

@r4f4 r4f4 commented Feb 3, 2025

Description

The oc-mirror return code now reflects the category of the copy failure:

  • 1: generic error
  • 2: release image error
  • 4: operator image error
  • 8: helm image error
  • 16: additional image error

So a return code of 12 means there were operator and helm image copy errors (4 + 8).

Because release image errors are fatal, when one happens the return code will always be 2.

Github / Jira issue:

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Code Improvements (Refactoring, Performance, CI upgrades, etc)
  • Internal repo assets (diagrams / docs on github repo)
  • This change requires a documentation update on openshift docs

How Has This Been Tested?

  1. Release image copy error:
[...]
 ✗   (0s) Collecting catalog registry.redhat.io/redhat/redhat-operator-index:v4.16
2025/02/03 22:43:14  [WARN]   : [OperatorImageCollector] catalog unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication : SKIPPING
2025/02/03 22:43:14  [INFO]   : 🔍 collecting additional images...
2025/02/03 22:43:14  [INFO]   : 🔍 collecting helm images...
2025/02/03 22:43:14  [INFO]   : 🔂 rebuilding catalogs
2025/02/03 22:43:14  [INFO]   : 🚀 Start copying the images...
0 / 193 (2s) [------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 0 %
    () ocp-v4.0-art-dev@sha256:0a9d409693d25a9fbaf70a6c3f75abfcb54e3c9070e608b84da519aa7d874440 ➡  cache
 ✗   (2s) ocp-v4.0-art-dev@sha256:0800280100d6e3376a4c4afe98d724a032e7a05aff5b49dbaef0651c51d3f357 ➡  cache
 ✗   (2s) ocp-v4.0-art-dev@sha256:072942d3febcbe1f3035991b8df66eea15197084211e2f638dac043e490962e8 ➡  cache
 ✗   (2s) ocp-v4.0-art-dev@sha256:0395b06b99a78cf783f05d32fd0cf2d9f6d968c4fdf8253eeee008acacb6985d ➡  cache
 ✗   (2s) ocp-v4.0-art-dev@sha256:0319c91da24478e97006d34dde4e45d68ec8b0c0022d2032557c96cde5295f96 ➡  cache
 ✗   (2s) ocp-v4.0-art-dev@sha256:021aa92489672492a14b30690c735c9e9597e7e79a2c0e78362d32543ee3d370 ➡  cache
 ✗   (2s) ocp-v4.0-art-dev@sha256:054d439588812645b13d719ac27304db00802b6ef3b692bd53d91fd88422338c ➡  cache
 ✗   (2s) ocp-v4.0-art-dev@sha256:0328838b74e5b6e521ff7873d846829ce9a06dc2b4c4acf5957fb169c2c3a494 ➡  cache
 ✗   (2s) ocp-v4.0-art-dev@sha256:0003f19dcff2060a3d3692753772fe1ebaac9e4bc7bd8c78bb9870ab82a5bd64 ➡  cache
 ✗   (2s) ocp-v4.0-art-dev@sha256:06c93951661e3bdad3964a8485291b97e93f6675c25acfc8399c35d3b59b95fa ➡  cache
2025/02/03 22:43:17  [INFO]   : === Results ===
2025/02/03 22:43:17  [INFO]   :  ✗  0 / 192 release images mirrored: Some release images failed to be mirrored - please check the logs
2025/02/03 22:43:17  [INFO]   :  ✗  0 / 1 additional images mirrored: Some additional images failed to be mirrored - please check the logs
2025/02/03 22:43:17  [ERROR]  : [Worker] error mirroring image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:021aa92489672492a14b30690c735c9e9597e7e79a2c0e78362d32543ee3d370 error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:021aa92489672492a14b30690c735c9e9597e7e79a2c0e78362d32543ee3d370: reading manifest sha256:021aa92489672492a14b30690c735c9e9597e7e79a2c0e78362d32543ee3d370 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized
2025/02/03 22:43:17  [INFO]   : 📦 Preparing the tarball archive...
2025/02/03 22:43:19  [INFO]   : mirror time     : 22.086540093s
2025/02/03 22:43:19  [INFO]   : 👋 Goodbye, thank you for using oc-mirror
2025/02/03 22:43:19  [ERROR]  : [Worker] some errors occurred during the mirroring.
         Please review ~/Downloads/oc-mirror-issue-ret/working-dir/logs/mirroring_errors_20250203_224317.txt for a list of mirroring errors.
         You may consider:
         * removing images or operators that cause the error from the image set config, and retrying
         * keeping the image set config (images are mandatory for you), and retrying
         * mirroring the failing images manually, if retries also fail.
$ echo $?
2

Expected Outcome

Exit codes other than 1.

The `oc-mirror` return code now reflects the category of the copy
failure:
*  1: generic error
*  2: release image error
*  4: operator image error
*  8: helm image error
* 16: additional image error

So a return code of `12` means there were operator and helm image copy
errors (4 + 8).

Because release image errors are fatal, when one happens the return code
will always be `2`.
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 3, 2025
Copy link

openshift-ci bot commented Feb 3, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: r4f4
Once this PR has been reviewed and has the lgtm label, please assign aguidirh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-ci bot commented Feb 4, 2025

@r4f4: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn fa284f6 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e fa284f6 link true /test e2e
ci/prow/sanity fa284f6 link true /test sanity
ci/prow/unit fa284f6 link true /test unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@@ -20,3 +20,16 @@ func (e *NormalStorageInterruptError) Is(err error) bool {
_, ok := err.(*NormalStorageInterruptError)
return ok
}

type ExecutorSchemaError struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r4f4 - Nice 👍 for wrapping the exitCode in a struct

@lmzuccarelli
Copy link
Contributor

lmzuccarelli commented Feb 4, 2025

@r4f4 - Could you create an OCPBUGS issue so that we can backport to 4.18 ?

@kasturinarra Could you look creating an automated test matrix to cover the various error exit scenarios's, also could you include Nidan ?

Comment on lines +34 to +38
genericErrorCode = 1 << 0
releaseImageErrorCode = 1 << 1
operatorErrorCode = 1 << 2
helmImageErrorCode = 1 << 3
additionalImageErrorCode = 1 << 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the Bitwise operation.

This approach will allows oc-mirror to combine multiple error codes.

Big win here.


defer o.closeAll()
return nil
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 here, since error zero value is nil :)

if !o.Opts.IsDryRun {
err = o.RebuildCatalogs(cmd.Context(), collectorSchema)
if err != nil {
return err
}
var copiedSchema v2alpha1.CollectorSchema
// call the batch worker
if cs, err := o.Batch.Worker(cmd.Context(), collectorSchema, *o.Opts); err != nil {
if copiedSchema, errorredSchema, err = o.Batch.Worker(cmd.Context(), collectorSchema, *o.Opts); err != nil {
if _, ok := err.(batch.UnsafeError); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the way oc-mirror is going to handle errors is different now, maybe this batch.UnsafeError/batch.SafeError is not needed anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll see if I can simplify this part. I tried to keep the existing error because of the possible errors when creating the errors file.

@@ -188,7 +188,7 @@ func (o DeleteImages) DeleteRegistryImages(deleteImageList v2alpha1.DeleteImageL

o.Opts.Stdout = io.Discard
if !o.Opts.Global.DeleteGenerate && len(o.Opts.Global.DeleteDestination) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the executorErrorFromBatchError needed to be called here? Or is it going to be called by the executor.go also for the delete command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to differentiate errors for the delete command? Or is delete just a simple "either we deleted all or we failed"?

Copy link
Contributor

@lmzuccarelli lmzuccarelli Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r4f4 - IMHO delete is a sub-command and if it fails we know delete fails, so it's a simple pass/fail and no need to differentiate errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So based on @lmzuccarelli answer only having a genericErrorCode is enough for the delete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants