feat(Metrics&Alert): 新增规则与阈值管理的表结构，核心接口，编排流程，与初步实现 #87

acd19ml · 2025-09-19T14:10:23Z

变更背景和解决方案

关联issue: #42

文档更新

更新 api文档，数据库表文档，模块内部实现文档

niupilot · 2025-09-19T14:10:28Z

Code Review Complete

Completed comprehensive review of PR #87: feat(Metrics&Alert) - Rule and threshold management implementation

Summary

This PR introduces a rule and threshold management system for the alerting service with:

Database schema for alert rules and metadata management
Go implementation with proper interfaces and abstractions
In-memory Prometheus sync mechanism
Change logging for audit trails
Comprehensive unit tests

Identified Issues

High Confidence Issues (with suggested fixes):

Missing SQL injection protection: store_pg.go:79 - JSON marshaling error is ignored, could lead to invalid queries
Incomplete error handling: Multiple instances of ignored errors in JSON operations
Race condition potential: promsync_exporter.go:38-43 - String prefix matching could fail with similar rule names
Inefficient string concatenation: Multiple locations using string concatenation instead of builders
Missing transaction implementation: store_pg.go:21-24 - Placeholder transaction wrapper provides no atomicity

Lower Confidence Issues (discussion points):

Architecture concern: In-memory exporter may not scale across multiple replicas
Validation gaps: Limited validation for rule expressions and operator values
Performance consideration: No pagination or limits for GetMetas queries
Testing coverage: Limited integration tests with actual database

Positive Aspects

Clean interface design with proper separation of concerns
Comprehensive documentation and README
Good test coverage for core functionality
Proper label normalization and canonicalization
Consistent error handling patterns

The implementation provides a solid foundation for rule management with room for production-hardening improvements.

👉 Want me to re-run this review? Just type /review in the comments! For more usage examples, visit CodeAgent GitHub Repository.

gemini-code-assist · 2025-09-19T14:10:48Z

Summary of Changes

Hello @acd19ml, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在建立一个全面且灵活的告警规则与阈值管理系统。它通过引入新的数据库表结构来存储告警规则、其元数据（包括基于标签的阈值和观察窗口）以及变更日志，从而实现更精细化的控制和可审计性。同时，通过Go语言实现的核心业务逻辑和内存同步机制，确保了规则和阈值能够被高效管理并实时同步到Prometheus，极大地提升了告警配置的动态性和响应速度。

Highlights

新的规则与阈值管理表结构: 引入了新的数据库表 alert_rules 用于定义告警规则，alert_rule_metas 用于存储基于标签的阈值和观察窗口，以及 alert_meta_change_logs 用于记录阈值变更历史，支持审计。
核心Go接口与业务逻辑: 实现了 AlertRuleMgr、Store 和 PromSync 等核心Go接口，用于管理告警规则和元数据，处理持久化存储，并与Prometheus进行同步。Manager 协调这些组件，确保规则和阈值的创建、更新和删除流程的正确性。
Prometheus同步机制: 引入了内存版的 ExporterSync 实现，能够即时更新告警规则的阈值和观察窗口，无需重启Prometheus或重新加载规则文件，从而实现动态配置。
文档更新: 更新了API文档和数据库设计文档，以反映新的告警等级定义和规则与阈值管理系统的详细设计，并新增了 ruleset 模块的详细README文档，涵盖了目标、组件、流程、并发和测试等方面。

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

niupilot

Code Review: Rule and Threshold Management Implementation

Found several issues requiring attention for production readiness:

Critical Issues

• Missing error handling for JSON marshaling - Could lead to SQL injection or runtime panics
• Race condition in rule deletion - String prefix matching is unsafe with similar rule names
• No transaction atomicity - Current WithTx implementation provides no guarantees
• Potential memory leaks - Ignored JSON unmarshaling errors could cause issues

Architecture Notes

• In-memory exporter design may not scale across replicas
• Consider implementing proper transaction support for production use
• Add validation for rule expressions and operators

internal/alerting/service/ruleset/store_pg.go

gemini-code-assist

Code Review

本次 PR 引入了告警规则和阈值管理的核心功能，包括新的数据表结构、核心服务接口和初步实现。整体设计思路清晰，将规则模板与元数据（阈值）分离，并通过标签实现灵活的配置，这是一个很好的实践。代码结构合理，并附有详细的说明文档，值得称赞。

我在代码和文档中发现了一些可以改进的地方，主要集中在错误处理、接口设计和实现细节的健壮性上。例如，多处 json 操作忽略了错误返回，store_pg.go 中对 PostgreSQL interval 类型的处理比较脆弱，以及变更日志的幂等性 ID 生成方式可能无法满足设计初衷。此外，文档与实现之间存在一些不一致之处。

具体的建议请见下面的评论。

internal/alerting/service/ruleset/store_pg.go

docs/alerting/database-design.md

internal/alerting/service/ruleset/README.md

docs/alerting/database-design.md

internal/alerting/service/ruleset/types.go

internal/alerting/service/ruleset/manager.go

- Add heal_actions table and related DAO/Service layers - Implement observation window mechanism using Redis - Add P0/P1/P2 alert processing logic with fault healing - Update remediation consumer with healing and observation flows - Add comprehensive tests for new functionality - Update documentation with new healing process

…elop

…into develop" This reverts commit bc51939, reversing changes made to 7ae82d1.

feat(Metrics&Alert): 新增规则与阈值管理的表结构，核心接口，编排流程，与初步实现

af45248

acd19ml requested review from yuzhai, LiusCraft and Erickw87 September 19, 2025 14:10

acd19ml self-assigned this Sep 19, 2025

acd19ml added documentation Improvements or additions to documentation model/Metrics&Alert labels Sep 19, 2025

niupilot bot reviewed Sep 19, 2025

View reviewed changes

internal/alerting/service/ruleset/store_pg.go Outdated Show resolved Hide resolved

internal/alerting/service/ruleset/store_pg.go Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Sep 19, 2025

View reviewed changes

Erickw87 reviewed Sep 20, 2025

View reviewed changes

internal/alerting/service/ruleset/manager.go Outdated Show resolved Hide resolved

Erickw87 reviewed Sep 20, 2025

View reviewed changes

internal/alerting/service/ruleset/manager.go Outdated Show resolved Hide resolved

Erickw87 reviewed Sep 20, 2025

View reviewed changes

internal/alerting/service/ruleset/manager.go Outdated Show resolved Hide resolved

Erickw87 reviewed Sep 20, 2025

View reviewed changes

internal/alerting/service/ruleset/manager.go Outdated Show resolved Hide resolved

acd19ml added 4 commits September 21, 2025 01:30

优化promethues与数据库操作顺序，改进错误处理，改进存储类型转换

7ae82d1

Merge branch 'feature/alert-healing-with-observation-window' into dev…

bc51939

…elop

Revert "Merge branch 'feature/alert-healing-with-observation-window' …

b40dbd3

…into develop" This reverts commit bc51939, reversing changes made to 7ae82d1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(Metrics&Alert): 新增规则与阈值管理的表结构，核心接口，编排流程，与初步实现 #87

feat(Metrics&Alert): 新增规则与阈值管理的表结构，核心接口，编排流程，与初步实现 #87

Uh oh!

acd19ml commented Sep 19, 2025

Uh oh!

niupilot bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Sep 19, 2025

Uh oh!

niupilot bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat(Metrics&Alert): 新增规则与阈值管理的表结构，核心接口，编排流程，与初步实现 #87

Are you sure you want to change the base?

feat(Metrics&Alert): 新增规则与阈值管理的表结构，核心接口，编排流程，与初步实现 #87

Uh oh!

Conversation

acd19ml commented Sep 19, 2025

变更背景和解决方案

文档更新

Uh oh!

niupilot bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Summary

Identified Issues

High Confidence Issues (with suggested fixes):

Lower Confidence Issues (discussion points):

Positive Aspects

Uh oh!

gemini-code-assist bot commented Sep 19, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

niupilot bot left a comment

Choose a reason for hiding this comment

Code Review: Rule and Threshold Management Implementation

Critical Issues

Architecture Notes

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niupilot bot commented Sep 19, 2025 •

edited

Loading