DBAIOps

This is the code repository of DBAIOps, a novel database operation and maintenance (O&M) system that integrates expert experience to achieve practical anomaly detection, diagnosis, and recovery. Specifically, it conducts database O&M based on the following techniques:

(1) O&M Maintenance Graph: a graph-based experience model to represent various possible diagnosis paths within O&M experience;
(2) Correlation-Aware Anomaly Model: capture implicit correlations across metrics and real-world anomalies and initialize knowledge graph exploration during online diagnosis;
(3) O&M Path Exploration: a two-stage graph evolution mechanism that adaptively explores possible diagnosis paths for the input anomaly;
(4) Reasoning LLM Based Diagnosis: prompt reasoning LLM to reason over these diagnosis paths and generate diagnosis reports with specific recovery solutions.

Evaluation over four mainstream database systems demonstrates that DBAIOps outperforms state-of-the-art baselines, 47.85% higher in diagnosis accuracy and report interpretability. The structure is as follows:

DBAIOps Platform Deployment Guide
- 1. System Preparation
- 2. Single-Node Installation
Case Study of Oracle Diagnosis Report

Note that the full content about the case study of the diagnosis reports can be found at the Appedinx_Diagnosis_Report.

DBAIOps Platform Deployment Guide

├── bin                         # DBAIOps installation and deployment scripts
├── webserver                   # Frontend webserver configuration
├── fstaskpkg                   # Task platform configurations
├── knowl                       # Knowledge graph base scripts
├── colscript                   # Python3 collection scripts (health & cib metrics)
├── pgconf                      # PostgreSQL initialization config for DSmart metadata
└── phantomjsconf               # Frontend webserver JS plugin configurations

1. System Preparation

1.1 Hardware Requirements

Minimum Configuration (for <20 database instances)

Purpose	Hardware Specs	Supported OS	IP Address	Qty	Notes
Database, Task Platform, Web Server	16C/32GB/600GB	RedHat 7.2+ (YUM configured) CentOS 7.2+ (YUM configured) SUSE 12 SP4 (zypper configured) Kylin V10 SP1/SP2/SP3	1	1	No restrictions

Recommended Configuration (for >60 database instances)

Purpose	Hardware Specs	Supported OS	IP Address	Qty	Notes
Database Server	16C/64GB/1TB	RedHat 7.2+ CentOS 7.2+ SUSE 12 SP4	3	1	Physical server required
Web Server & Task Platform	16C/64GB/300GB	RedHat 7.2+ CentOS 7.2+ SUSE 12 SP4	3	3	No restrictions

Notes:

DBAIOps database server can be installed in any directory (only disk space required);
Distributed installation requires root SSH passwordless login between DBAIOps servers during setup (can be disabled post-installation). Single-node installation doesn't require this.

1.2 Firewall Port Configuration

Between Ops machine and DBAIOps servers:

Required ports: 22, 18081, 18090, 28080

Between DBAIOps servers and monitored databases:

Oracle: 11521, 1521
MySQL: 3306
PostgreSQL: 5432
DM8: 5236

1.3 Upload Installation Media

Upload installation package to DBAIOps server

2. Single-Node Installation

2.1 Configure YUM

Configure YUM repo:

mount /dev/sr0 /mnt
vi /etc/yum.repos.d/local.repo

Install dependencies:

cd /usr/software/bin
./DBAIOps-system-package.sh -install

2.2 Installation

Step 1: Execute Installation

cd /usr/software/bin
./DBAIOps.sh -install

Step 2: Select Deployment Type

Select deployment type [Default is single-node]:

Single-node deployment
Distributed deployment (requires manual role.cfg configuration)

Step 3: OS Selection

Choose OS type [Select according to your OS]:

RedHat
CentOS

Step 4: Installation Progress

Installation typically takes 1-1.5 hours depending on system performance
RPM packages will be automatically installed

Step 5: Post-Installation

After successful installation, restart DBAIOps services:

/usr/software/bin/DBAIOps.sh -restart

2.3 Service Management Commands

Command	Description
`./DBAIOps.sh -start`	Start all DBAIOps services
`./DBAIOps.sh -stop`	Stop all DBAIOps services
`./DBAIOps.sh -status`	Check service status
`./DBAIOps.sh -restart`	Restart all services

2.4 Accessing DBAIOps Platform

Web Interface:

http://<SERVER_IP>:18081/DBAIOps

Default Credentials:

Username: admin
Password: admin@123

Security Notice:

Change the default password immediately after first login;
Recommended to use a strong password following your organization's security policy;
For production environments, consider implementing additional security measures (IP restrictions, 2FA, etc.).

Case Study of Oracle Diagnosis Report

Anomaly: LOG SYNCHRONIZATION DELAY, which detects slowdowns caused by log-writing operations. When a transaction commits or rolls back, forcing a session to wait for the log writer to flush redo logs, the system can degrade significantly if I/O capacity is insufficient.

Abnormal Metrics List:

Metric ID	Metric Name	Min	Max	Avg
2184301	log file sync average wait time	1.08 ms	15.2 ms	6.0 ms
2184305	log file parallel write average wait time	0.78 ms	7.09 ms	3.0 ms
000014	Mem unavailable	72.82	73.07	73.0

Root Cause:

(1) Insufficient I/O Performance of REDO Log Storage:

Evidence Chains:
- Metric 2184301 (log file sync): max = 15.2 ms, avg = 6.0 ms (abnormal)
- Metric 2184305 (log file parallel write): max = 7.09 ms, avg = 3.0 ms (abnormal)
Anomaly Analysis:
- According to the formula: log_file_sync / log_file_parallel_write = 6.0 / 3.0 = 2
- Ratio in range (1.5, 2] ⇒ Storage I/O performance is a primary factor

(2) Intermittent I/O Pressure Spikes During Log Writing:

Evidence Chains:
- At 06:00, log file sync spiked to 15.2 ms (vs. hourly avg 4.54 ms)
- log file parallel write peaked at 7.09 ms (high for mechanical disks)
- OS I/O latency = 0.47 ms (normal)
Anomaly Analysis:
- Storage likely under transient load

💬 DBA Feedback

✅ The predefined hypothesis of the experiment is that the bottleneck lies in I/O storage.
✅ Both root causes align with this hypothesis (first evaluation criterion satisfied)

✅ Evidence chains are logically consistent with Oracle/OS theory (second criterion)

✅ All data comes from actual experiments with no hallucinations (third criterion)

🎯 Final Accuracy Assessment: 100%

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bin		bin
colscript		colscript
fstaskpkg		fstaskpkg
knowl		knowl
pgconf		pgconf
phantomjsconf		phantomjsconf
Appendix_Diagnosis_Report.pdf		Appendix_Diagnosis_Report.pdf
DBAIOps.cfg		DBAIOps.cfg
DBAIOps.cfg.init		DBAIOps.cfg.init
DBAIOps_2018.sql		DBAIOps_2018.sql
HEval_criteria.md		HEval_criteria.md
LLM_output_similarity.md		LLM_output_similarity.md
README.md		README.md
fonts.dir		fonts.dir
fonts.scale		fonts.scale
fstask_tables_postgres.sql		fstask_tables_postgres.sql
role.cfg		role.cfg
update_knowl.py		update_knowl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DBAIOps

DBAIOps Platform Deployment Guide

1. System Preparation

1.1 Hardware Requirements

Minimum Configuration (for <20 database instances)

Recommended Configuration (for >60 database instances)

1.2 Firewall Port Configuration

1.3 Upload Installation Media

2. Single-Node Installation

2.1 Configure YUM

2.2 Installation

Step 1: Execute Installation

Step 2: Select Deployment Type

Step 3: OS Selection

Step 4: Installation Progress

Step 5: Post-Installation

2.3 Service Management Commands

2.4 Accessing DBAIOps Platform

Case Study of Oracle Diagnosis Report

💬 DBA Feedback

About

Uh oh!

Releases

Packages

Languages

weAIDB/DBAIOps

Folders and files

Latest commit

History

Repository files navigation

DBAIOps

DBAIOps Platform Deployment Guide

1. System Preparation

1.1 Hardware Requirements

Minimum Configuration (for <20 database instances)

Recommended Configuration (for >60 database instances)

1.2 Firewall Port Configuration

1.3 Upload Installation Media

2. Single-Node Installation

2.1 Configure YUM

2.2 Installation

Step 1: Execute Installation

Step 2: Select Deployment Type

Step 3: OS Selection

Step 4: Installation Progress

Step 5: Post-Installation

2.3 Service Management Commands

2.4 Accessing DBAIOps Platform

Case Study of Oracle Diagnosis Report

💬 DBA Feedback

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages