[DOCS-1700] Add documentation for viewing automation execution history

mdlinville · mdlinville · commit be89d34f025d · 2025-08-27T17:07:10.000-07:00
diff --git a/content/en/guides/core/automations/_index.md b/content/en/guides/core/automations/_index.md
@@ -38,13 +38,14 @@ To [create an automation]({{< relref "create-automations/" >}}), you:
 1. If required, configure [secrets]({{< relref "/guides/core/secrets.md" >}}) for sensitive strings the automation requires, such as access tokens, passwords, or sensitive configuration details. Secrets are defined in your **Team Settings**. Secrets are most commonly used in webhook automations to securely pass credentials or tokens to the webhook's external service without exposing it in plain text or hard-coding it in the webhook's payload.
 1. Configure the webhook or Slack notification to authorize W&B to post to Slack or run the webhook on your behalf. A single automation action (webhook or Slack notification) can be used by multiple automations. These actions are defined in your **Team Settings**.
 1. In the project or registry, create the automation:
-    1. Define the [event]({{< relref "#automation-events" >}}) to watch for, such as when a new artifact version is added. 
+    1. Define the [event]({{< relref "#automation-events" >}}) to watch for, such as when a new artifact version is added.
     1. Define the action to take when the event occurs (posting to a Slack channel or running a webhook). For a webhook, specify a secret to use for the access token and/or a secret to send with the payload, if required.
 
 ## Limitations
 [Run metric automations]({{< relref "automation-events.md#run-metrics-events">}}) are currently supported only in [W&B Multi-tenant Cloud]({{< relref "/guides/hosting/#wb-multi-tenant-cloud" >}}).
 
 ## Next steps
 - [Create an automation]({{< relref "create-automations/" >}}).
+- [View an automation's history]({{< relref "view-automation-history.md" >}}) to track executions and debug issues.
 - Learn about [Automation events and scopes]({{< relref "automation-events.md" >}}).
 - [Create a secret]({{< relref "/guides/core/secrets.md" >}}).
diff --git a/content/en/guides/core/automations/create-automations/_index.md b/content/en/guides/core/automations/create-automations/_index.md
@@ -13,7 +13,7 @@ weight: 1
 This page gives an overview of creating and managing W&B [automations]({{< relref "/guides/core/automations/">}}). For more detailed instructions, refer to [Create a Slack automation]({{< relref "/guides/core/automations/create-automations/slack.md" >}}) or [Create a webhook automation]({{< relref "/guides/core/automations/create-automations/webhook.md" >}}).
 
 {{% alert %}}
-Looking for companion tutorials for automations? 
+Looking for companion tutorials for automations?
 - [Learn to automatically triggers a Github Action for model evaluation and deployment](https://wandb.ai/wandb/wandb-model-cicd/reports/Model-CI-CD-with-W-B--Vmlldzo0OTcwNDQw).
 - [Watch a video demonstrating automatically deploying a model to a Sagemaker endpoint](https://www.youtube.com/watch?v=s5CMj_w3DaQ).
 - [Watch a video series introducing automations](https://youtube.com/playlist?list=PLD80i8An1OEGECFPgY-HPCNjXgGu-qGO6&feature=shared).
@@ -28,7 +28,7 @@ Looking for companion tutorials for automations?
 Create an automation from the project or registry's **Automations** tab. At a high level, to create an automation, follow these steps:
 
 1. If necessary, [create a W&B secret]({{< relref "/guides/core/secrets.md" >}}) for each sensitive string required by the automation, such as an access token, password, or SSH key. Secrets are defined in your **Team Settings**. Secrets are most commonly used in webhook automations.
-1. Configure the webhook or Slack integration to authorize W&B to post to Slack or run the webhook on your behalf. A single webhook or Slack integration can be used by multiple automations. These actions are defined in your **Team Settings**. 
+1. Configure the webhook or Slack integration to authorize W&B to post to Slack or run the webhook on your behalf. A single webhook or Slack integration can be used by multiple automations. These actions are defined in your **Team Settings**.
 1. In the project or registry, create the automation, which specifies the event to watch for and the action to take (such as posting to Slack or running a webhook). When you create a webhook automation, you configure the payload it sends.
 
 Or, from a line plot in the workspace, you can quickly create a [run metric automation]({{< relref "/guides/core/automations/automation-events.md#run-events" >}}) for the metric it shows:
@@ -47,6 +47,7 @@ For details, refer to:
 View and manage automations from a project or registry's **Automations** tab.
 
 - To view an automation's details, click its name.
+- To view an automation's execution history, click its name and select the **History** tab. See [View an automation's history]({{< relref "/guides/core/automations/view-automation-history.md" >}}) for details.
 - To edit an automation, click its action `...` menu, then click **Edit automation**.
 - To delete an automation, click its action `...` menu, then click **Delete automation**.
 
diff --git a/content/en/guides/core/automations/view-automation-history.md b/content/en/guides/core/automations/view-automation-history.md
@@ -0,0 +1,303 @@
+---
+menu:
+  default:
+    identifier: view-automation-history
+    parent: automations
+title: View an automation's history
+weight: 3
+---
+{{% pageinfo color="info" %}}
+{{< readfile file="/_includes/enterprise-cloud-only.md" >}}
+{{% /pageinfo %}}
+
+This page describes how to view and understand the execution history of your W&B [automations]({{< relref "/guides/core/automations/">}}) and track what triggered an automation, what actions were taken, and whether they succeeded or failed.
+
+## Overview
+
+Each execution of an automation generates a comprehensive log entry that includes:
+- **Execution timestamp**: When the automation was triggered.
+- **Trigger event**: The specific event that caused the automation to run.
+- **Status**: The execution's status. See [Execution status](#execution-status).
+- **Action details**: Information about what action was performed, such as notifying a Slack channel or running a webhook.
+- **Error messages**: Detailed error information for a failed automation.
+
+## View automation history
+
+You can view automation history from multiple locations in the W&B interface:
+
+### From the Automations tab
+
+{{< tabpane text=true >}}
+{{% tab "Registry" %}}
+1. Navigate to your registry by clicking on **Model Registry** in the left sidebar.
+1. Select your registry from the list.
+1. Click **Automations** tab to view the registry's automations. In each row, the **Last execution** column shows when the automation last executed.
+1. In the **Automations history** tab, view all executions of the registry's automations in reverse chronological order, starting with the most recent execution. Each execution's metadata displays, including the event, action, and status.
+
+{{% /tab %}}
+{{% tab "Project" %}}
+1. Navigate to your project from the W&B home page or by using the project selector.
+1. Click the **Automations** tab in the project navigation bar (located alongside Overview, Workspace, Runs, etc.). The project's automations display.
+
+    - Find the automation you want to investigate. You can use the search bar to filter by automation name, and you can sort by the last triggered date to find recently executed automations.
+
+    - Click an automation name to open its details page.
+1. In the **History** tab, view all executions of the project's automations in reverse chronological order, starting with the most recent execution. Each execution's metadata displays, including the event, action, and status.
+
+{{% /tab %}}
+{{< /tabpane >}}
+
+### From the automation details page
+
+When viewing an individual automation:
+
+1. Navigate to the automation details page by clicking an automation name from the Automations tab.
+1. Click the **History** tab to view a chronological list of all executions.
+1. Each entry shows:
+   - Execution date and time.
+   - Triggering event details.
+   - Status indicator (success, failure, or in progress).
+   - Duration of execution.
+
+## Understanding execution details
+
+Click any automation execution entry to view detailed information. The details shown depend on the execution's status. See [Execution status](#execution-status) and the following sections.
+
+### Execution status
+
+| Status | Icon | Description |
+|--------|------|-------------|
+| **Success** | ✅ | A green checkmark indicates that the automation completed successfully and the action was performed |
+| **Failed** | ❌ | A red X indicates that the automation encountered an error and could not complete. |
+| **In Progress** | 🔄 Spinning icon | A spinning arrow icon indicates that the automation is running. |
+| **Cancelled** | ⏹️ Gray square icon | A gray square icon indicates that the automation was manually stopped before completion. |
+| **Skipped** | ⏭️ Gray forward arrow icon | A gray forward arrow icon indicates that the automation was triggered but subsequently skipped because its conditions were not met. |
+
+#### Successful executions
+A successful execution shows:
+- **Trigger information**:
+  - Event type (e.g., "Artifact alias added")
+  - Source details (artifact name, version, user who triggered)
+  - Exact timestamp with timezone
+- **Payload sent**:
+  - For Slack: The formatted message content
+  - For webhooks: The complete JSON payload (with sensitive values masked)
+- **Delivery confirmation**:
+  - HTTP status code (e.g., "200 OK")
+  - Response time in milliseconds
+  - For Slack: Channel and thread information
+- **Response data** (webhook automations only):
+  - Response headers
+  - Response body (truncated if large)
+  - Any returned job IDs or reference numbers
+
+#### Failed executions
+A failed execution shows:
+- **Error summary**: High-level description (e.g., "Connection timeout", "Authentication failed")
+- **Detailed error message**:
+  ```text
+  Error: Failed to connect to webhook endpoint
+  URL: https://api.example.com/webhook
+  Status: 502 Bad Gateway
+  Response: "upstream server temporarily unavailable"
+  ```
+- **Failure stage**: Where in the process it failed:
+  - "Pre-validation" - Failed before sending
+  - "Connection" - Network or DNS issues
+  - "Authentication" - Invalid credentials or tokens
+  - "Processing" - Remote server rejected the request
+- **Debugging information**:
+  - Request headers sent
+  - Curl command equivalent for testing
+  - Suggested fixes based on error type
+- **Retry options**:
+  - "Retry Now" button (if automation is still valid)
+  - "Edit and Retry" to modify payload before retrying
+
+#### Skipped or cancelled executions
+Skipped or cancelled executions show details about why it was skipped or who cancelled it.
+
+## Filter and search automation history
+
+This section shows various ways to filter and search for automation executions.
+
+### Status filter dropdown
+Click the **Status** dropdown to filter executions:
+- **All statuses** (default): Shows every execution
+- **Successful**: Shows only executions with green checkmarks
+- **Failed**: Shows only executions with red X marks
+- **In Progress**: Shows currently running executions
+- **Cancelled**: Shows manually stopped executions
+
+The filter updates the list in real-time, and the count badge shows the number of matching executions.
+
+### Date range picker
+Click the calendar icon to open the date range selector:
+- **Quick ranges** (buttons at the top):
+  - Last 24 hours
+  - Last 7 days
+  - Last 30 days
+  - Last 90 days
+- **Custom range**:
+  - Select start and end dates from the calendar
+  - Time selection available for precision
+  - Timezone selector (defaults to browser timezone)
+
+### Search bar
+The search bar supports both basic text search across all execution data and advanced search using operators. For example:
+
+- `status:failed`: Find failed executions.
+- `status:failed error:401`: Find failed executions with authentication errors.
+- `trigger:"artifact alias"`: Find executions that match a trigger.
+- `trigger:"run metric" metric:loss`: Find automations triggered by a given run metric's value.
+- `webhook:https://api.example.com`: Find executions that called a specific webhook endpoint.
+- `duration:>10s`: Find executions that took longer than 10 seconds.
+- `error:timeout`: Find matching error messages.
+- `artifact:model-v2`: Find executions that relate to s specific artifact.
+- `artifact:"production-model" last 7 days`: Find recent executions that relate to a specific artifact.
+- `user:jane@company.com`: Find executions triggered by specific users.
+
+
+## Common use cases
+
+### Debug failed automations
+1. Filter the history to show only failed executions using the status dropdown.
+1. Click a failed execution to open the error details panel.
+1. Review the error information to identify the issue:
+
+   **Common webhook endpoint issues**:
+   - **404 Not Found**: Verify the webhook URL is correct.
+   - **500 Internal Server Error**: Check with the webhook service provider.
+   - **SSL Certificate Error**: Ensure valid HTTPS certificates.
+
+   **Authentication problems**:
+   - **401 Unauthorized**:
+     - Navigate to Team Settings > Secrets.
+     - Update the secret value used by the automation.
+     - Test with the **Test webhook** button.
+   - **403 Forbidden**: Check API permissions and scope.
+
+   **Network connectivity**:
+   - **Connection timeout**:
+     - Verify the endpoint is accessible.
+     - Check firewall rules if using private endpoints.
+     - Consider increasing timeout in webhook configuration (Edit automation > Advanced settings > Request timeout).
+
+   **Payload formatting**:
+   - **400 Bad Request**:
+     - Review the JSON syntax in the payload template.
+     - Ensure all required fields are included.
+     - Check data types match the endpoint's expectations.
+
+1. After fixing the issue:
+   - Click **Retry Now** to test the fix immediately.
+   - Monitor the next scheduled execution.
+
+### Verify automation triggers
+1. Check the history to confirm an automation was triggered by a specific event.
+1. Verify the timing and frequency of executions.
+1. If necessary, adjust automations that are triggering too frequently or missing expected events or conditions.
+
+### Audit automation activity
+1. Export automation history for compliance or reporting.
+1. Track which user and action triggered a given automation.
+1. Monitor the overall health and reliability of your automation workflows.
+
+## Retention policy
+- **Standard retention**: 90 days of execution history.
+- **Extended retention**: Up to 365 days of execution history for Enterprise plans. Contact [support](mailto:support@wandb.ai) or your account team to express interest.
+
+During the retention period for an organization, the following details are kept:
+- **Failed execution details**: Full error logs and request/response data.
+- **Successful execution summaries**: Essential details. Payload details may be truncated after 30 days.
+
+## Export automation data
+TODO: Verify. I can't find this UI anywhere.
+
+This section shows how to export automation history for compliance or analysis.
+
+1. Click the **Export** button (download icon) at the top of the history list.
+1. Select export format:
+   - **CSV**: Tabular format with key fields.
+   - **JSON**: Complete execution details including payloads.
+   - **PDF**: Formatted report for documentation.
+1. Choose the date range to export.
+1. Click **Generate Export**.
+1. The export will be downloaded to your browser's default download location.
+
+**CSV export includes**:
+- Execution ID (e.g., `exec_1234567890`)
+- Timestamp (UTC) (e.g., `2024-01-15T14:30:00Z`)
+- Status (e.g., `Success`, `Failed`, `Cancelled`)
+- Trigger type and details (e.g., `artifact_alias_added: model-v2`)
+- Duration (e.g., `2.3s`)
+- Error message (if applicable) (e.g., `Connection timeout after 30s`)
+- User who triggered (for manual triggers) (e.g., `user@example.com`)
+
+## Troubleshooting
+
+### Automation not appearing in history
+If an expected automation execution doesn't appear:
+
+1. **Verify the trigger event occurred**:
+   - For artifact events: Check the artifact's version history.
+   - For run metrics: Confirm the run logged the expected metric values.
+   - For aliases/tags: Verify they were actually applied.
+
+1. **Check automation status**:
+   - Look for a **Disabled** badge on the automation list.
+   - Click the automation's name to open its configuration.
+   - Turn the automation back on using the toggle.
+
+1. **Review filter criteria**:
+   - Click the automation's name to open its configuration.
+   - Check the **Filters** section for:
+     - Artifact name patterns (regex).
+     - Collection restrictions.
+     - User filters.
+   - Test your event against the filter using the **Test filters** tool.
+
+1. **Inspect conditional logic**:
+   - Advanced automations may have "Only if" conditions. For example, "Only trigger if artifact size > 100MB".
+   - Check if your event met all conditions.
+
+1. **Timing considerations**:
+   - History may have a 1-2 minute delay to update after an automation runs.
+   - Refresh the page after a few minutes, then check the "Last checked" timestamp at the top of the history.
+
+### Missing execution details
+Some execution details may be limited if:
+- The automation was created before history tracking was turned on.
+- Network issues prevented complete logging.
+- The automation was deleted and recreated with the same name.
+
+## Recommendations
+
+1. **Monitor automations**:
+   - Set a regular reminder to review automation histories.
+   - Focus on automations critical to your workflow, and look for patterns in execution times and success rates.
+
+1. **Set up alerts**:
+   - Configure email notifications for automation failures in your team settings.
+   - Send automtion alerts to a dedicated Slack channel.
+   - Use webhook automations to trigger PagerDuty for critical failures.
+
+1. **Document patterns**:
+   - Keep a runbook of common errors and their solutions.
+   - Document which external services each webhook depends on.
+   - Note any time-based patterns to expect, such as transient failures during maintenance.
+
+1. **Test automations**:
+   - Use test artifacts or events while developing an automation and before turning it on in production.
+   - Verify the first few executions for a new automation.
+   - Test webhook endpoints independently using tools or scripts outside W&B.
+
+1. **Performance optimization**:
+   - Monitor execution duration trends.
+   - Investigate automations that unexpectedly take longer than 30 seconds.
+   - To improve performance, consider breaking complex automations into smaller, focused ones.
+
+## Next steps
+- Learn about [automation events]({{< relref "/guides/core/automations/automation-events.md" >}}) that can trigger automations
+- [Create a Slack automation]({{< relref "/guides/core/automations/create-automations/slack.md" >}})
+- [Create a webhook automation]({{< relref "/guides/core/automations/create-automations/webhook.md" >}})