Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show failed command after atmos workflow failure #767

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Cerebrovinny
Copy link
Collaborator

@Cerebrovinny Cerebrovinny commented Nov 7, 2024

What

  • Added improved workflow failure handling with clear error messages
  • Added --from-step flag to resume workflows from specific steps
  • Added documentation and help text for workflow failure handling
  • Added example workflow demonstrating failure scenarios

Why

  • Makes it easier to debug and fix workflow failures by showing:
  • Which step failed
  • The exact command that failed
  • How to resume from the failed step
  • Saves time by allowing users to resume workflows from failed steps instead of restarting from beginning
  • Improves user experience with clear guidance on how to recover from failures

Example workflow demonstrating the feature:

workflows:

  test-1:
    description: "Test workflow with intentionally failing steps"
    steps:
      - name: "step-1"
        type: shell
        command: "echo 'This step will succeed' && exit 0"
      
      - name: "step-2"
        type: shell
        command: "echo 'This step will fail' && exit 1"
      
      - name: "step-3"
        type: shell
        command: "echo 'This step should not execute'"

Tests

Last test after requested changes @osterman @aknysh
When step-2 fails, users will see:
CleanShot 2024-11-08 at 20 00 10

references

DEV-2717

Summary by CodeRabbit

  • New Features

    • Enhanced error handling during workflow execution, providing detailed error messages and instructions for resuming from failed steps.
    • Updated command documentation to include failure handling and resume functionality with examples.
  • Documentation

    • Added a new section on "Workflow Failure Handling and Resume" to clarify user experience during workflow failures.
    • Clarified the "Stack Precedence" section to improve understanding of stack definitions in workflows.

Copy link
Contributor

coderabbitai bot commented Nov 7, 2024

📝 Walkthrough
📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes in this pull request enhance error handling in the workflow execution process. The ExecuteWorkflow function in internal/exec/workflow_utils.go has been modified to improve error messaging, providing detailed information when a command fails. Additionally, documentation has been updated to reflect these changes, including a new section on workflow failure handling and resume functionality, as well as clarifications on stack precedence in workflows.

Changes

File Path Change Summary
internal/exec/workflow_utils.go Enhanced error handling in ExecuteWorkflow, added detailed error messages, and introduced filepath usage.
website/docs/core-concepts/workflows/workflows.mdx Added section on "Workflow Failure Handling and Resume" and clarified "Stack Precedence" section.
cmd/workflow.go Updated workflowCmd description to include details on failure handling and resume functionality.

Assessment against linked issues

Objective Addressed Explanation
Show failed command after atmos workflow failure (DEV-2717)

Possibly related PRs

Suggested labels

minor, no-release

Suggested reviewers

  • osterman
  • aknysh
  • johncblandii
  • gberenice

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary or @auto-summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @auto-title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
internal/exec/workflow_utils.go (1)

109-124: Consider extracting error message formatting to a dedicated function.

While the implementation is solid, the error message construction logic could be moved to a separate function for better maintainability and reusability.

Consider this approach:

+func formatWorkflowError(step schema.WorkflowStep, command, workflowPath, workflow string) string {
+    workflowFileName := filepath.Base(workflowPath)
+    workflowFileName = strings.TrimSuffix(workflowFileName, filepath.Ext(workflowFileName))
+
+    failedMsg := color.New(color.FgRed).Sprintf("\nStep '%s' failed!", step.Name)
+    cmdMsg := color.New(color.FgYellow).Sprintf("Failed command: %s", command)
+    resumeMsg := color.New(color.FgGreen).Sprintf(
+        "\nTo resume the workflow from this step, run:\ncd %s && atmos workflow %s -f %s --from-step %s",
+        filepath.Dir(workflowPath),
+        workflow,
+        workflowFileName,
+        step.Name,
+    )
+
+    return fmt.Sprintf("%s\n%s\n%s", failedMsg, cmdMsg, resumeMsg)
+}

 if err != nil {
-    workflowFileName := filepath.Base(workflowPath)
-    workflowFileName = strings.TrimSuffix(workflowFileName, filepath.Ext(workflowFileName))
-
-    failedMsg := color.New(color.FgRed).Sprintf("\nStep '%s' failed!", step.Name)
-    cmdMsg := color.New(color.FgYellow).Sprintf("Failed command: %s", command)
-    resumeMsg := color.New(color.FgGreen).Sprintf(
-        "\nTo resume the workflow from this step, run:\ncd %s && atmos workflow %s -f %s --from-step %s",
-        filepath.Dir(workflowPath),
-        workflow,
-        workflowFileName,
-        step.Name,
-    )
-
-    return fmt.Errorf("%s\n%s\n%s\nError: %v", failedMsg, cmdMsg, resumeMsg, err)
+    errMsg := formatWorkflowError(step, command, workflowPath, workflow)
+    return fmt.Errorf("%s\nError: %v", errMsg, err)
 }
internal/exec/help.go (1)

107-118: Enhance the help message clarity and specificity.

The help message effectively communicates the workflow failure handling, but could be more helpful with these improvements:

  1. Use a more descriptive workflow name in the example (e.g., "production-deploy" instead of "workflow1")
  2. Provide a more realistic error message example that includes specific details
  3. Consider adding a note about any timeout or retry behavior

Here's a suggested improvement:

 u.PrintMessage("\nAtmos workflow commands support failure handling and resume functionality:\n\n" +
   "When a workflow step fails:\n" +
   " - The failed step name and command will be displayed\n" +
   " - A resume command will be provided to restart from the failed step\n\n" +
   "Example:\n" +
   "Step 'deploy-vpc' failed!\n" +
   "Error: Error applying plan:\n" +
-  "1 error occurred: AWS API call failed\n\n" +
+  "1 error occurred: AWS API call failed - InvalidVPCID.NotFound: The vpc-id 'vpc-12345' does not exist\n\n" +
   "Failed command: terraform apply vpc -auto-approve\n\n" +
   "To resume the workflow from this step, run:\n" +
-  "atmos workflow deploy-infra -f workflow1 --from-step deploy-vpc\n\n" +
+  "atmos workflow deploy-infra -f production-deploy --from-step deploy-vpc\n\n" +
+  "Note: Failed steps will timeout after 30 minutes. No automatic retries are performed.\n\n" +
   "For more details refer to https://atmos.tools/cli/commands/workflow/resume\n")
website/docs/core-concepts/workflows/workflows.mdx (3)

374-380: Consider adding context about the --from-step flag in the introduction.

The section effectively explains what happens during a failure, but it would be helpful to mention upfront that this feature works in conjunction with the --from-step flag to enable workflow resumption. This addition would better align with the PR objectives of highlighting the new resumption capability.

 ## Workflow Failure Handling and Resume

 When a workflow step fails, Atmos will:
 1. Display which step failed
 2. Show the exact command that failed 
 3. Provide a ready-to-use command to resume the workflow from the failed step
+
+This functionality works in conjunction with the `--from-step` flag, allowing users to resume workflows from any specific step after addressing the failure.

381-404: Enhance the example with a successful resume scenario.

The example effectively demonstrates the failure case, but it would be even more helpful to show what happens when a user successfully resumes the workflow. This would complete the user's mental model of the feature.

 To resume the workflow from this step, run:
 atmos workflow test-1 -f workflow1 --from-step step-2
+
+After fixing the issue and running the resume command, you'll see:
+```console
+Resuming workflow 'test-1' from step 'step-2'...
+
+Step 'step-2' completed successfully
+Step 'step-3' completed successfully
+
+Workflow completed successfully
+```

374-405: Add a note about error handling behavior.

The section would benefit from clarifying whether Atmos preserves error information between sessions. This would help users understand if they need to copy the resume command immediately or if they can retrieve it later.

 ## Workflow Failure Handling and Resume

 When a workflow step fails, Atmos will:
 1. Display which step failed
 2. Show the exact command that failed 
 3. Provide a ready-to-use command to resume the workflow from the failed step
+
+:::note
+The error information and resume command are displayed only when the failure occurs. Make sure to note down the resume command if you plan to fix the issue and continue later, as this information is not persisted between Atmos sessions.
+:::
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4617630 and 4f4bccc.

📒 Files selected for processing (3)
  • internal/exec/help.go (1 hunks)
  • internal/exec/workflow_utils.go (3 hunks)
  • website/docs/core-concepts/workflows/workflows.mdx (1 hunks)
🔇 Additional comments (3)
internal/exec/workflow_utils.go (2)

7-7: LGTM! Clean import addition.

The filepath package is appropriately added and utilized for path manipulations in the error message formatting.


109-124: Excellent error handling improvements! 💪

The enhanced error messages now provide:

  • Clear indication of the failed step
  • The exact command that failed
  • Color-coded output for better readability
  • Precise instructions for resuming the workflow

This aligns perfectly with the PR objectives and significantly improves the user experience.

internal/exec/help.go (1)

106-106: LGTM! Clean conditional structure.

The new condition for workflow help follows the established pattern used for other component types, maintaining code consistency.

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 7, 2024
@osterman
Copy link
Member

osterman commented Nov 7, 2024

Can you show a screenshot so I can see what the coloring looks like?

internal/exec/help.go Outdated Show resolved Hide resolved
internal/exec/help.go Outdated Show resolved Hide resolved
Copy link
Member

@aknysh aknysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cerebrovinny thanks, the PR looks good, please address the comments

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
internal/exec/help.go (1)

106-118: STRONG WORK, WARRIOR! Let's refine this battle plan with a few adjustments.

  1. The URL should point to the specific resume documentation page:
-For more details refer to https://atmos.tools/cli/commands/workflow/
+For more details refer to https://atmos.tools/cli/commands/workflow/resume
  1. The -f flag in the example command needs explanation. Consider adding:
 To resume the workflow from this step, run:
-atmos workflow deploy-infra -f workflow1 --from-step deploy-vpc
+atmos workflow deploy-infra -f workflow1 --from-step deploy-vpc
+
+Where:
+ -f, --file        Workflow file to execute (e.g., workflow1)
+ --from-step       Resume workflow execution from the specified step
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4f4bccc and 65f6c95.

📒 Files selected for processing (3)
  • internal/exec/help.go (1 hunks)
  • internal/exec/workflow_utils.go (3 hunks)
  • website/docs/core-concepts/workflows/workflows.mdx (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/exec/workflow_utils.go
  • website/docs/core-concepts/workflows/workflows.mdx

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 8, 2024
@osterman
Copy link
Member

osterman commented Nov 8, 2024

@Cerebrovinny please make the following changes:
image

"Failed command: ..." to should sent to Debug (and renamed to "Command failed:")
"Error: exit status 1" should be sent to Debug

Also, "Error: exit status 1" should appear before the tip, as it realtes to the "Failed command"

This is to reduce the amount of technical information on the screen.

internal/exec/help.go Outdated Show resolved Hide resolved
@osterman
Copy link
Member

osterman commented Nov 9, 2024

@Cerebrovinny please resolve conversations that are addressed

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 10, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
cmd/workflow.go (1)

Line range hint 52-52: Enhance the --from-step flag description.

Consider providing a more detailed description that explains the flag's purpose in workflow recovery.

-workflowCmd.PersistentFlags().String("from-step", "", "atmos workflow <name> -f <file> --from-step <step-name>")
+workflowCmd.PersistentFlags().String("from-step", "", "Resume workflow execution from a specific step (e.g., after a failure)")
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c2e327b and 4990c98.

📒 Files selected for processing (1)
  • cmd/workflow.go (1 hunks)
🔇 Additional comments (2)
cmd/workflow.go (2)

17-22: Strong documentation of failure handling capabilities!

The new documentation clearly explains what users can expect when a workflow fails. This directly addresses the feature request and improves user experience.


32-33: Documentation link is well-placed and formatted correctly!

The reference to detailed documentation provides users with a clear path to more information.

cmd/workflow.go Outdated Show resolved Hide resolved
@osterman
Copy link
Member

Please share updated screenshots

@osterman osterman added the minor New features that do not break anything label Nov 11, 2024
@Cerebrovinny
Copy link
Collaborator Author

Please share updated screenshots
Screenshot 2024-11-11 at 20 52 16

Screenshot 2024-11-11 at 21 03 09

@osterman
Copy link
Member

@Cerebrovinny I really don't understand this screenshot.

image

Why are we printing all this information?

image

At first, I thought I was looking at the output atmos workflow --help, then I realized I was looking at the output of atmos workflow test-1...

We need to get clear that output of a command should:

  1. Output the successful result
  2. Output the error message and possible suggestions or usage.

Output of any command other than --help should not output the help screen.

Usage != Help.

@Cerebrovinny
Copy link
Collaborator Author

@Cerebrovinny I really don't understand this screenshot.

image

Why are we printing all this information?

image At first, I thought I was looking at the output `atmos workflow --help`, then I realized I was looking at the output of `atmos workflow test-1`...

We need to get clear that output of a command should:

  1. Output the successful result
  2. Output the error message and possible suggestions or usage.

Output of any command other than --help should not output the help screen.

Usage != Help.

@aknysh this message could be a regression of #781

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor New features that do not break anything
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants