-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5e34795
commit de32f5c
Showing
1 changed file
with
65 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,67 @@ | ||
# Creating new RCA datasets and Developing new RCA methods | ||
|
||
TBW | ||
## Creating new RCA datasets | ||
To create new RCA datasets, follow these steps: | ||
|
||
1. **System Setup**: Deploy the target microservice system in a controlled environment, such as a Kubernetes cluster, and configure it to generate telemetry data (metrics, logs, traces). | ||
|
||
2. **Fault Injection**: Identify the fault types to include (e.g., resource, network, code-level faults). Use tools like `stress-ng` for resource faults, `tc` for network faults, and manual code modifications for code-level faults. | ||
|
||
3. **Data Collection**: | ||
- **Metrics**: Use tools like Prometheus and cAdvisor to gather system metrics. | ||
- **Logs**: Employ log aggregators like Fluent Bit or Loki to collect and structure logs. | ||
- **Traces**: Use tracing tools like Jaeger to capture distributed traces. | ||
|
||
4. **Fault Annotation**: Annotate the collected data with labels for the injected faults, including: | ||
- The time of fault injection. | ||
- The root cause service. | ||
- Specific root cause indicators (e.g., a metric, log entry, or trace span). | ||
|
||
5. **Data Processing**: Format the telemetry data into a structured format like CSV or JSON. Ensure consistency by including columns for timestamps, service names, and telemetry data values. | ||
|
||
6. **Validation**: Engage domain experts to validate the dataset for accuracy and completeness. | ||
|
||
7. **Documentation**: Provide a README or similar file with details about the dataset, including: | ||
- The systems used. | ||
- Fault types included. | ||
- Instructions for downloading and using the dataset. | ||
|
||
## Developing new RCA methods | ||
To develop new RCA methods and integrate them into RCAEval, follow these steps: | ||
|
||
1. **Define the Approach**: | ||
- Decide on the type of RCA method (metric-based, trace-based, multi-source). | ||
- Determine the algorithm or technique to use (e.g., statistical analysis, causal inference, machine learning). | ||
|
||
2. **Implement the Method**: | ||
- Create a new Python file in the `RCAEval/e2e/` directory, naming it appropriately (e.g., `new_method.py`). | ||
- Implement the method as a Python function with the following signature: | ||
```python | ||
def new_method(data, inject_time=None, dataset=None, sli=None, anomalies=None, **kwargs): | ||
# Method logic here | ||
return { | ||
"ranks": ranked_root_causes, | ||
} | ||
``` | ||
|
||
3. **Preprocess the Data**: | ||
- Use existing utilities from `RCAEval.io.time_series` to preprocess the input telemetry data, such as `preprocess`, `drop_constant`, or `select_useful_cols`. | ||
|
||
4. **Analyze the Data**: | ||
- Implement the core logic for root cause analysis. | ||
- Rank the root cause candidates based on their likelihood of causing the failure. | ||
|
||
5. **Test the Method**: | ||
- Write unit tests in `tests/test_new_method.py` to ensure correctness and reproducibility. | ||
- Use sample datasets available in RCAEval to validate the method. | ||
|
||
6. **Integrate with RCAEval**: | ||
- Add the method to `RCAEval/e2e/__init__.py` for seamless import. | ||
- Update the `main.py` evaluation script to include the new method by adding it to the `--method` options. | ||
|
||
7. **Document the Method**: | ||
- Provide usage examples in the README or a dedicated tutorial notebook in the `docs/` folder. | ||
- Include a description of the method, its assumptions, and limitations. | ||
|
||
8. **Contribute Back**: | ||
- Submit a pull request to the RCAEval repository with the new method and associated documentation. |