Skip to content

Commit be89d34

Browse files
committed
[DOCS-1700] Add documentation for viewing automation execution history
1 parent 0d1f28e commit be89d34

File tree

3 files changed

+308
-3
lines changed

3 files changed

+308
-3
lines changed

content/en/guides/core/automations/_index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,13 +38,14 @@ To [create an automation]({{< relref "create-automations/" >}}), you:
3838
1. If required, configure [secrets]({{< relref "/guides/core/secrets.md" >}}) for sensitive strings the automation requires, such as access tokens, passwords, or sensitive configuration details. Secrets are defined in your **Team Settings**. Secrets are most commonly used in webhook automations to securely pass credentials or tokens to the webhook's external service without exposing it in plain text or hard-coding it in the webhook's payload.
3939
1. Configure the webhook or Slack notification to authorize W&B to post to Slack or run the webhook on your behalf. A single automation action (webhook or Slack notification) can be used by multiple automations. These actions are defined in your **Team Settings**.
4040
1. In the project or registry, create the automation:
41-
1. Define the [event]({{< relref "#automation-events" >}}) to watch for, such as when a new artifact version is added.
41+
1. Define the [event]({{< relref "#automation-events" >}}) to watch for, such as when a new artifact version is added.
4242
1. Define the action to take when the event occurs (posting to a Slack channel or running a webhook). For a webhook, specify a secret to use for the access token and/or a secret to send with the payload, if required.
4343

4444
## Limitations
4545
[Run metric automations]({{< relref "automation-events.md#run-metrics-events">}}) are currently supported only in [W&B Multi-tenant Cloud]({{< relref "/guides/hosting/#wb-multi-tenant-cloud" >}}).
4646

4747
## Next steps
4848
- [Create an automation]({{< relref "create-automations/" >}}).
49+
- [View an automation's history]({{< relref "view-automation-history.md" >}}) to track executions and debug issues.
4950
- Learn about [Automation events and scopes]({{< relref "automation-events.md" >}}).
5051
- [Create a secret]({{< relref "/guides/core/secrets.md" >}}).

content/en/guides/core/automations/create-automations/_index.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ weight: 1
1313
This page gives an overview of creating and managing W&B [automations]({{< relref "/guides/core/automations/">}}). For more detailed instructions, refer to [Create a Slack automation]({{< relref "/guides/core/automations/create-automations/slack.md" >}}) or [Create a webhook automation]({{< relref "/guides/core/automations/create-automations/webhook.md" >}}).
1414

1515
{{% alert %}}
16-
Looking for companion tutorials for automations?
16+
Looking for companion tutorials for automations?
1717
- [Learn to automatically triggers a Github Action for model evaluation and deployment](https://wandb.ai/wandb/wandb-model-cicd/reports/Model-CI-CD-with-W-B--Vmlldzo0OTcwNDQw).
1818
- [Watch a video demonstrating automatically deploying a model to a Sagemaker endpoint](https://www.youtube.com/watch?v=s5CMj_w3DaQ).
1919
- [Watch a video series introducing automations](https://youtube.com/playlist?list=PLD80i8An1OEGECFPgY-HPCNjXgGu-qGO6&feature=shared).
@@ -28,7 +28,7 @@ Looking for companion tutorials for automations?
2828
Create an automation from the project or registry's **Automations** tab. At a high level, to create an automation, follow these steps:
2929

3030
1. If necessary, [create a W&B secret]({{< relref "/guides/core/secrets.md" >}}) for each sensitive string required by the automation, such as an access token, password, or SSH key. Secrets are defined in your **Team Settings**. Secrets are most commonly used in webhook automations.
31-
1. Configure the webhook or Slack integration to authorize W&B to post to Slack or run the webhook on your behalf. A single webhook or Slack integration can be used by multiple automations. These actions are defined in your **Team Settings**.
31+
1. Configure the webhook or Slack integration to authorize W&B to post to Slack or run the webhook on your behalf. A single webhook or Slack integration can be used by multiple automations. These actions are defined in your **Team Settings**.
3232
1. In the project or registry, create the automation, which specifies the event to watch for and the action to take (such as posting to Slack or running a webhook). When you create a webhook automation, you configure the payload it sends.
3333

3434
Or, from a line plot in the workspace, you can quickly create a [run metric automation]({{< relref "/guides/core/automations/automation-events.md#run-events" >}}) for the metric it shows:
@@ -47,6 +47,7 @@ For details, refer to:
4747
View and manage automations from a project or registry's **Automations** tab.
4848

4949
- To view an automation's details, click its name.
50+
- To view an automation's execution history, click its name and select the **History** tab. See [View an automation's history]({{< relref "/guides/core/automations/view-automation-history.md" >}}) for details.
5051
- To edit an automation, click its action `...` menu, then click **Edit automation**.
5152
- To delete an automation, click its action `...` menu, then click **Delete automation**.
5253

Lines changed: 303 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,303 @@
1+
---
2+
menu:
3+
default:
4+
identifier: view-automation-history
5+
parent: automations
6+
title: View an automation's history
7+
weight: 3
8+
---
9+
{{% pageinfo color="info" %}}
10+
{{< readfile file="/_includes/enterprise-cloud-only.md" >}}
11+
{{% /pageinfo %}}
12+
13+
This page describes how to view and understand the execution history of your W&B [automations]({{< relref "/guides/core/automations/">}}) and track what triggered an automation, what actions were taken, and whether they succeeded or failed.
14+
15+
## Overview
16+
17+
Each execution of an automation generates a comprehensive log entry that includes:
18+
- **Execution timestamp**: When the automation was triggered.
19+
- **Trigger event**: The specific event that caused the automation to run.
20+
- **Status**: The execution's status. See [Execution status](#execution-status).
21+
- **Action details**: Information about what action was performed, such as notifying a Slack channel or running a webhook.
22+
- **Error messages**: Detailed error information for a failed automation.
23+
24+
## View automation history
25+
26+
You can view automation history from multiple locations in the W&B interface:
27+
28+
### From the Automations tab
29+
30+
{{< tabpane text=true >}}
31+
{{% tab "Registry" %}}
32+
1. Navigate to your registry by clicking on **Model Registry** in the left sidebar.
33+
1. Select your registry from the list.
34+
1. Click **Automations** tab to view the registry's automations. In each row, the **Last execution** column shows when the automation last executed.
35+
1. In the **Automations history** tab, view all executions of the registry's automations in reverse chronological order, starting with the most recent execution. Each execution's metadata displays, including the event, action, and status.
36+
37+
{{% /tab %}}
38+
{{% tab "Project" %}}
39+
1. Navigate to your project from the W&B home page or by using the project selector.
40+
1. Click the **Automations** tab in the project navigation bar (located alongside Overview, Workspace, Runs, etc.). The project's automations display.
41+
42+
- Find the automation you want to investigate. You can use the search bar to filter by automation name, and you can sort by the last triggered date to find recently executed automations.
43+
44+
- Click an automation name to open its details page.
45+
1. In the **History** tab, view all executions of the project's automations in reverse chronological order, starting with the most recent execution. Each execution's metadata displays, including the event, action, and status.
46+
47+
{{% /tab %}}
48+
{{< /tabpane >}}
49+
50+
### From the automation details page
51+
52+
When viewing an individual automation:
53+
54+
1. Navigate to the automation details page by clicking an automation name from the Automations tab.
55+
1. Click the **History** tab to view a chronological list of all executions.
56+
1. Each entry shows:
57+
- Execution date and time.
58+
- Triggering event details.
59+
- Status indicator (success, failure, or in progress).
60+
- Duration of execution.
61+
62+
## Understanding execution details
63+
64+
Click any automation execution entry to view detailed information. The details shown depend on the execution's status. See [Execution status](#execution-status) and the following sections.
65+
66+
### Execution status
67+
68+
| Status | Icon | Description |
69+
|--------|------|-------------|
70+
| **Success** || A green checkmark indicates that the automation completed successfully and the action was performed |
71+
| **Failed** || A red X indicates that the automation encountered an error and could not complete. |
72+
| **In Progress** | 🔄 Spinning icon | A spinning arrow icon indicates that the automation is running. |
73+
| **Cancelled** | ⏹️ Gray square icon | A gray square icon indicates that the automation was manually stopped before completion. |
74+
| **Skipped** | ⏭️ Gray forward arrow icon | A gray forward arrow icon indicates that the automation was triggered but subsequently skipped because its conditions were not met. |
75+
76+
#### Successful executions
77+
A successful execution shows:
78+
- **Trigger information**:
79+
- Event type (e.g., "Artifact alias added")
80+
- Source details (artifact name, version, user who triggered)
81+
- Exact timestamp with timezone
82+
- **Payload sent**:
83+
- For Slack: The formatted message content
84+
- For webhooks: The complete JSON payload (with sensitive values masked)
85+
- **Delivery confirmation**:
86+
- HTTP status code (e.g., "200 OK")
87+
- Response time in milliseconds
88+
- For Slack: Channel and thread information
89+
- **Response data** (webhook automations only):
90+
- Response headers
91+
- Response body (truncated if large)
92+
- Any returned job IDs or reference numbers
93+
94+
#### Failed executions
95+
A failed execution shows:
96+
- **Error summary**: High-level description (e.g., "Connection timeout", "Authentication failed")
97+
- **Detailed error message**:
98+
```text
99+
Error: Failed to connect to webhook endpoint
100+
URL: https://api.example.com/webhook
101+
Status: 502 Bad Gateway
102+
Response: "upstream server temporarily unavailable"
103+
```
104+
- **Failure stage**: Where in the process it failed:
105+
- "Pre-validation" - Failed before sending
106+
- "Connection" - Network or DNS issues
107+
- "Authentication" - Invalid credentials or tokens
108+
- "Processing" - Remote server rejected the request
109+
- **Debugging information**:
110+
- Request headers sent
111+
- Curl command equivalent for testing
112+
- Suggested fixes based on error type
113+
- **Retry options**:
114+
- "Retry Now" button (if automation is still valid)
115+
- "Edit and Retry" to modify payload before retrying
116+
117+
#### Skipped or cancelled executions
118+
Skipped or cancelled executions show details about why it was skipped or who cancelled it.
119+
120+
## Filter and search automation history
121+
122+
This section shows various ways to filter and search for automation executions.
123+
124+
### Status filter dropdown
125+
Click the **Status** dropdown to filter executions:
126+
- **All statuses** (default): Shows every execution
127+
- **Successful**: Shows only executions with green checkmarks
128+
- **Failed**: Shows only executions with red X marks
129+
- **In Progress**: Shows currently running executions
130+
- **Cancelled**: Shows manually stopped executions
131+
132+
The filter updates the list in real-time, and the count badge shows the number of matching executions.
133+
134+
### Date range picker
135+
Click the calendar icon to open the date range selector:
136+
- **Quick ranges** (buttons at the top):
137+
- Last 24 hours
138+
- Last 7 days
139+
- Last 30 days
140+
- Last 90 days
141+
- **Custom range**:
142+
- Select start and end dates from the calendar
143+
- Time selection available for precision
144+
- Timezone selector (defaults to browser timezone)
145+
146+
### Search bar
147+
The search bar supports both basic text search across all execution data and advanced search using operators. For example:
148+
149+
- `status:failed`: Find failed executions.
150+
- `status:failed error:401`: Find failed executions with authentication errors.
151+
- `trigger:"artifact alias"`: Find executions that match a trigger.
152+
- `trigger:"run metric" metric:loss`: Find automations triggered by a given run metric's value.
153+
- `webhook:https://api.example.com`: Find executions that called a specific webhook endpoint.
154+
- `duration:>10s`: Find executions that took longer than 10 seconds.
155+
- `error:timeout`: Find matching error messages.
156+
- `artifact:model-v2`: Find executions that relate to s specific artifact.
157+
- `artifact:"production-model" last 7 days`: Find recent executions that relate to a specific artifact.
158+
- `user:[email protected]`: Find executions triggered by specific users.
159+
160+
161+
## Common use cases
162+
163+
### Debug failed automations
164+
1. Filter the history to show only failed executions using the status dropdown.
165+
1. Click a failed execution to open the error details panel.
166+
1. Review the error information to identify the issue:
167+
168+
**Common webhook endpoint issues**:
169+
- **404 Not Found**: Verify the webhook URL is correct.
170+
- **500 Internal Server Error**: Check with the webhook service provider.
171+
- **SSL Certificate Error**: Ensure valid HTTPS certificates.
172+
173+
**Authentication problems**:
174+
- **401 Unauthorized**:
175+
- Navigate to Team Settings > Secrets.
176+
- Update the secret value used by the automation.
177+
- Test with the **Test webhook** button.
178+
- **403 Forbidden**: Check API permissions and scope.
179+
180+
**Network connectivity**:
181+
- **Connection timeout**:
182+
- Verify the endpoint is accessible.
183+
- Check firewall rules if using private endpoints.
184+
- Consider increasing timeout in webhook configuration (Edit automation > Advanced settings > Request timeout).
185+
186+
**Payload formatting**:
187+
- **400 Bad Request**:
188+
- Review the JSON syntax in the payload template.
189+
- Ensure all required fields are included.
190+
- Check data types match the endpoint's expectations.
191+
192+
1. After fixing the issue:
193+
- Click **Retry Now** to test the fix immediately.
194+
- Monitor the next scheduled execution.
195+
196+
### Verify automation triggers
197+
1. Check the history to confirm an automation was triggered by a specific event.
198+
1. Verify the timing and frequency of executions.
199+
1. If necessary, adjust automations that are triggering too frequently or missing expected events or conditions.
200+
201+
### Audit automation activity
202+
1. Export automation history for compliance or reporting.
203+
1. Track which user and action triggered a given automation.
204+
1. Monitor the overall health and reliability of your automation workflows.
205+
206+
## Retention policy
207+
- **Standard retention**: 90 days of execution history.
208+
- **Extended retention**: Up to 365 days of execution history for Enterprise plans. Contact [support](mailto:[email protected]) or your account team to express interest.
209+
210+
During the retention period for an organization, the following details are kept:
211+
- **Failed execution details**: Full error logs and request/response data.
212+
- **Successful execution summaries**: Essential details. Payload details may be truncated after 30 days.
213+
214+
## Export automation data
215+
TODO: Verify. I can't find this UI anywhere.
216+
217+
This section shows how to export automation history for compliance or analysis.
218+
219+
1. Click the **Export** button (download icon) at the top of the history list.
220+
1. Select export format:
221+
- **CSV**: Tabular format with key fields.
222+
- **JSON**: Complete execution details including payloads.
223+
- **PDF**: Formatted report for documentation.
224+
1. Choose the date range to export.
225+
1. Click **Generate Export**.
226+
1. The export will be downloaded to your browser's default download location.
227+
228+
**CSV export includes**:
229+
- Execution ID (e.g., `exec_1234567890`)
230+
- Timestamp (UTC) (e.g., `2024-01-15T14:30:00Z`)
231+
- Status (e.g., `Success`, `Failed`, `Cancelled`)
232+
- Trigger type and details (e.g., `artifact_alias_added: model-v2`)
233+
- Duration (e.g., `2.3s`)
234+
- Error message (if applicable) (e.g., `Connection timeout after 30s`)
235+
- User who triggered (for manual triggers) (e.g., `[email protected]`)
236+
237+
## Troubleshooting
238+
239+
### Automation not appearing in history
240+
If an expected automation execution doesn't appear:
241+
242+
1. **Verify the trigger event occurred**:
243+
- For artifact events: Check the artifact's version history.
244+
- For run metrics: Confirm the run logged the expected metric values.
245+
- For aliases/tags: Verify they were actually applied.
246+
247+
1. **Check automation status**:
248+
- Look for a **Disabled** badge on the automation list.
249+
- Click the automation's name to open its configuration.
250+
- Turn the automation back on using the toggle.
251+
252+
1. **Review filter criteria**:
253+
- Click the automation's name to open its configuration.
254+
- Check the **Filters** section for:
255+
- Artifact name patterns (regex).
256+
- Collection restrictions.
257+
- User filters.
258+
- Test your event against the filter using the **Test filters** tool.
259+
260+
1. **Inspect conditional logic**:
261+
- Advanced automations may have "Only if" conditions. For example, "Only trigger if artifact size > 100MB".
262+
- Check if your event met all conditions.
263+
264+
1. **Timing considerations**:
265+
- History may have a 1-2 minute delay to update after an automation runs.
266+
- Refresh the page after a few minutes, then check the "Last checked" timestamp at the top of the history.
267+
268+
### Missing execution details
269+
Some execution details may be limited if:
270+
- The automation was created before history tracking was turned on.
271+
- Network issues prevented complete logging.
272+
- The automation was deleted and recreated with the same name.
273+
274+
## Recommendations
275+
276+
1. **Monitor automations**:
277+
- Set a regular reminder to review automation histories.
278+
- Focus on automations critical to your workflow, and look for patterns in execution times and success rates.
279+
280+
1. **Set up alerts**:
281+
- Configure email notifications for automation failures in your team settings.
282+
- Send automtion alerts to a dedicated Slack channel.
283+
- Use webhook automations to trigger PagerDuty for critical failures.
284+
285+
1. **Document patterns**:
286+
- Keep a runbook of common errors and their solutions.
287+
- Document which external services each webhook depends on.
288+
- Note any time-based patterns to expect, such as transient failures during maintenance.
289+
290+
1. **Test automations**:
291+
- Use test artifacts or events while developing an automation and before turning it on in production.
292+
- Verify the first few executions for a new automation.
293+
- Test webhook endpoints independently using tools or scripts outside W&B.
294+
295+
1. **Performance optimization**:
296+
- Monitor execution duration trends.
297+
- Investigate automations that unexpectedly take longer than 30 seconds.
298+
- To improve performance, consider breaking complex automations into smaller, focused ones.
299+
300+
## Next steps
301+
- Learn about [automation events]({{< relref "/guides/core/automations/automation-events.md" >}}) that can trigger automations
302+
- [Create a Slack automation]({{< relref "/guides/core/automations/create-automations/slack.md" >}})
303+
- [Create a webhook automation]({{< relref "/guides/core/automations/create-automations/webhook.md" >}})

0 commit comments

Comments
 (0)