Skip to content

Commit 4da3d33

Browse files
committed
[docs] Update the monitoring introduction for Flink agents
1 parent bd2f2f6 commit 4da3d33

File tree

3 files changed

+103
-12
lines changed

3 files changed

+103
-12
lines changed

docs/content/docs/operations/monitoring.md

Lines changed: 103 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -24,24 +24,115 @@ under the License.
2424

2525
## Metric
2626

27-
{{< hint warning >}}
28-
**TODO**: How to use add custom metrics.
27+
### Built-in Metrics
2928

30-
**TODO**: List of all built-in Metrics.
29+
We offer data monitoring for built-in metrics, which includes events and actions. Events provide general-level metrics, while actions deliver both general-level and type-specific data metrics.
3130

32-
**TODO**: How to check the metrics with Flink executor.
33-
{{< /hint >}}
31+
| Component Type | Count | Meter |
32+
|-------------------------------|------------------------------|--------------------------------|
33+
| Agent (Operator Builtin) | NumOfInput<br>NumOfOutput | NumOfInputPerSec<br>NumOfOutputPerSec |
34+
| Event | numOfEventProcessed | numOfEventProcessedPerSec |
35+
| Action | numOfActionsExecuted | numOfActionsExecutedPerSec |
36+
| Pre-Action | numOfActionsExecuted | numOfActionsExecutedPerSec |
37+
38+
####
39+
40+
### How to add custom metrics
41+
42+
In Flink Agents, users implement their logic by defining custom Actions that respond to various Events throughout the Agent lifecycle. To support user-defined metrics, we introduce two new properties: `agent_metric_group` and `action_metric_group` in the RunnerContext. These properties allow users to create or update global metrics and independent metrics for actions. For an introduction to metric types, please refer to the [Metric types documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/metrics/#metric-types).
43+
44+
Here is the user case example:
45+
46+
``````python
47+
class MyAgent(Agent):
48+
@action(InputEvent)
49+
@staticmethod
50+
def first_action(event: Event, ctx: RunnerContext): # noqa D102
51+
start_time = time.time_ns()
52+
input = event.input
53+
content = input.get_review() + " first action."
54+
ctx.send_event(MyEvent(value=content))
55+
56+
# Access the main agent metric group
57+
metrics = ctx.agent_metric_group
58+
59+
# Update global metrics
60+
metrics.get_counter("numInputEvent").inc()
61+
metrics.get_meter("numInputEventPerSec").mark()
62+
63+
# Access the per-action metric group
64+
action_metrics = ctx.action_metric_group
65+
action_metrics.get_histogram("actionLatencyMs") \
66+
.update(int(time.time_ns() - start_time) // 1000000)
67+
68+
@action(MyEvent)
69+
@staticmethod
70+
def second_action(event: Event, ctx: RunnerContext): # noqa D102
71+
input = event.value
72+
content = input + " second action."
73+
ctx.send_event(OutputEvent(output=content))
74+
75+
# Access the main agent metric group
76+
metrics = ctx.agent_metric_group
77+
78+
# Update global metrics
79+
metrics.get_counter("numMyEvent").inc()
80+
metrics.get_meter("numMyEventPerSecond").mark()
81+
82+
# Creating and tracking metrics for MyEvent using submetric group
83+
if isinstance(event, MyEvent):
84+
sub_metrics = metrics.action_metric_group
85+
sub_metrics.get_counter("numEvent").inc()
86+
sub_metrics.get_meter("numEventPerSecond").mark()
87+
``````
88+
89+
90+
91+
### How to check the metrics with Flink executor
92+
93+
We can check the metric result in the WebUI of Flink Job:
94+
95+
{{< img src="/fig/operations/metricwebui.png" alt="Metric Web UI" >}}
3496

3597
## Log
3698

37-
{{< hint warning >}}
38-
**TODO**: How to add log in Flink Agents.
99+
The Flink Agents' log system uses Flink's logging framework. For more details, please refer to the [Flink log system documentation](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/advanced/logging/).
100+
101+
### How to add log in Flink Agents
102+
103+
For adding logs in Java code, you can refer to [Best practices for developers](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/advanced/logging/#best-practices-for-developers). In Python, you can add logs using `logging`. Here is a specific example:
104+
105+
```python
106+
@action(InputEvent)
107+
@staticmethod
108+
def process_input(event: InputEvent, ctx: RunnerContext) -> None:
109+
logging.info("Processing input event: %s", event)
110+
# the action logic
111+
```
112+
113+
### How to check the logs with Flink executor
39114

40-
**TODO**: How to check the logs with Flink executor.
41-
{{< /hint >}}
115+
We can check the log result in the WebUI of Flink Job:
116+
117+
{{< img src="/fig/operations/logwebui.png" alt="Log Web UI" >}}
42118

43119
## Event Log
44120

45-
{{< hint warning >}}
46-
**TODO**: How to use and check the event logs.
47-
{{< /hint >}}
121+
Currently, the system supports **File-based Event Log** as the default implementation. Future releases will introduce support for additional types of event logs and provide configuration options to let users choose their preferred logging mechanism.
122+
123+
### File Event Log
124+
125+
The **File Event Log** is a file-based event logging system that stores events in structured files within a flat directory. Each event is recorded in **JSON Lines (JSONL)** format, with one JSON object per line.
126+
127+
#### File Structure
128+
129+
The log files follow a naming convention consistent with Flink's logging standards and are stored in a flat directory structure:
130+
131+
```
132+
{baseLogDir}/
133+
├── events-{jobId}-{taskName}-{subtaskId}.log
134+
├── events-{jobId}-{taskName}-{subtaskId}.log
135+
└── events-{jobId}-{taskName}-{subtaskId}.log
136+
```
137+
138+
By default, all File-based Event Logs are stored in the `flink-agents` subdirectory under the system temporary directory (`java.io.tmpdir`). In future versions, we plan to add a configurable parameter to allow users to customize the base log directory, providing greater control over log storage paths and lifecycle management.
3.89 MB
Loading
1.3 MB
Loading

0 commit comments

Comments
 (0)