Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions dingo/model/prompt/prompt_text_quality_kaoti.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
from dingo.model.model import Model
from dingo.model.prompt.base import BasePrompt


@Model.prompt_register("TEXT_QUALITY_KAOTI", [])
class PromptTextQualityV3Kaoti(BasePrompt):
content = """
# Role
You are an expert in language models and data quality assessment.

# Background
The dataset is compiled from diverse sources, including social media platforms, news outlets, academic journals, and online forums. Some datasets contain image links, which may appear in the question stem or answer. If an image link is present, it is always considered valid, correct, and reasonable.

# Goals
Your primary task is to detect formulas, tables, and other content in the text. The text consists of five parts:
1. **Question type information string**: `q_type`
2. **Question information string**: `q_main`
3. **Options information string**: `options`
4. **Answers information string**: `std_ans`
5. **Answer explanations string**: `answer_details`

**Note**:
- If the question type is a multiple-choice question (including single-choice, multiple-choice, and true/false questions), the `options` field must contain content and cannot be left blank.
- For non-multiple-choice question types, the `options` field is allowed to be empty.
- If the text meets any of the following negative descriptions, it will be judged as low-quality data.

# Criteria
## 1. Completeness
### 1.1 Error_Formula
Determine whether the formulas in the text can be correctly rendered by Markdown and adhere to the rendering style of MathJax or HTML, while maintaining consistency with the question and answers. Formula errors include, but are not limited to:
- LaTeX syntax errors
- Missing formula markers (`$`)
- Mathematical symbol errors
- Missing or excessive backslashes (`\`)
- Incorrect formula answers

### 1.2 Error_Table
Check whether the table in the text is correct. Table errors include, but are not limited to:
- Inconsistent formatting within the table
- Unreasonable typesetting
- LaTeX or Markdown syntax errors
- Mathematical symbol errors
- Missing or excessive vertical bar symbols (`|`)
- Chaotic row and column structure
- Incorrect table content

## 2. Effectiveness
### 2.1 Error_Split_Paragraph
Identify and mark any parts in the text that may affect coherence and readability due to unreasonable line breaks (`\n`). Key considerations:
- **Sentence integrity**: Check if sentences are unnecessarily broken into multiple lines. If a sentence should logically be a single unit but is broken by a line break (`\n`), pay attention to the lack of punctuation before and after the `\n` symbol, which is usually unreasonable.
- **Examples of incorrect usage**:
- "综上所述,我们可以确定选项\nB\"城乡社区治理\"最符合题目的要求"
- "所以,\n答案是C"
- "5.**开源工具\n**:包括各种开源的大数据工具,如Hadoop、Spark、Kafka等。"
- "其他选项\nA、C、D都与集成学习的基本原理不符。"
- "以上推理过程是根据试题集\n《22-23年理论》中的内容得出的。"
- "但对20世纪\n70年代以后的浮动汇率制时期的验证却显示出对购买力平价理论不利的结果。"
- "-C选项\n(一个U盘):U盘是存储信息的物理媒介,".

**Note**: Since the data text is a test question, the `q_main` field is allowed to contain normal sentences separated by empty brackets `()` or underscores `__`. Pay special attention to unreasonable segmentation caused by the `\n` character.

### 2.2 Error_Ans_Format
Ensure the quality of the answer analysis (`ans_detail`) by checking whether it is detailed, accurate, and in the expected format. Guidelines:
1. **Sensitive information**: Check if the analysis contains information about the source of the exam questions, the year, or other information that should not be disclosed. If present, mark it as low-quality.
2. **Conciseness**: Assess the level of detail in the analysis. If the analysis is too concise and lacks sufficient explanation, mark it as low-quality.

### 2.3 Error_List_Number
Analyze the content in the `q_main` and `ans_detail` fields. If a list number appears, determine whether the numbers or letters are in the correct order. If the numbers are discontinuous, missing, or in the wrong format, indicate the specific location and provide modification suggestions.

**Note**: You do not need to check the content itself, only the correctness of the numbers or letters.

### 2.4 Error_Content_Position
Check the following fields for positional disorder (`q_type`, `q_main`, `options`, `std_ans`, `ans_detail`):
1. **Question type (`q_type`)**: Ensure it only describes the question type (e.g., "multiple choice", "fill in the blank") and does not include the question stem, options, answers, or answer analysis.
2. **Question stem (`q_main`)**: Ensure it only contains the main content of the question and does not include options, answers, or answer analysis.
3. **Options (`options`)**: Ensure it only contains the content of the question options (e.g., "A. Option one", "B. Option two") and does not include the question stem, answers, or answer analysis.
4. **Standard answer (`std_ans`)**: Ensure it only contains the identifier of the correct answer (e.g., "A", "B") and does not include the question stem, options, or answer analysis.

**Rules for judgment**:
1. If the `q_main` field contains text in the format of options (e.g., "A. Option one"), it is considered mixed with options.
2. If the `options` field contains the question stem or answer content, it is considered mixed with the question stem or answer.
3. If the `std_ans` field is empty or contains question stem content, it is considered mixed with the question stem.

### 2.5 Error_Options_Format_Content
Ensure the format and content of the `options` field are correct. Guidelines:
**Option format check**:
1. Mark options with redundant serial numbers as format errors.
2. Ensure there are no duplicate options.
3. Check for extra option punctuation (e.g., incorrect: "A. .张三"; correct: "B. 李四").

**Option content check**:
1. Ensure each option is independent and not combined with other options.
2. Mark options with incomplete or similar content as incorrectly formatted.

## 3. Similarity
### 3.1 Error_Duplicate_Content
Identify consecutive repeated text or multiple occurrences of characters in the text.


# Workflow
1. **Evaluate the text**: Carefully read and understand the provided text. Assess its quality based on the negative criteria.
2. **Assign a type**:
- If the text does not violate any negative criteria, the type must be `Good`.
- If the text violates any negative criteria, the type must be one of: `Completeness`, `Effectiveness`, or `Similarity`.
3. **Assign a name**:
- If the type is `Good`, the name must be `None`.
- If the type is `Completeness`, the name must be one of: `Error_Formula` or `Error_Table`.
- If the type is `Effectiveness`, the name must be one of: `Error_Split_Paragraph`, `Error_Ans_Format`, `Error_List_Number`, `Error_Content_Position`, or `Error_Options_Format_Content`.
- If the type is `Similarity`, the name must be `Error_Duplicate_Content`.
4. **Assign a score**:
- If the type is `Good`, the score is `1`.
- If the type is not `Good`, the score is `0`.
5. **Provide a reason**: Clearly explain the evaluation result.
6. **Return the results**: Output the results in JSON format:
```json
{"score": 0/1, "type": "", "name": "", "reason": ""}


# Warning
Only output JSON format data, without any extraneous content.

# Input content
(Text to be evaluated goes here)
"""
78 changes: 78 additions & 0 deletions docs/eval/prompt/kaoti_data_evaluated_by_prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Dataset Kaoti

## Dataset Introduction
This dataset aims to evaluate the accuracy of the built-in kaoti prompt words in dingo, therefore, the test question data was selected to construct the test set.

| Field Name | Description |
|--------------|------------------------------------------------------------------------------------|
| id | DATA id, without special meaning, users can modify it according to their own needs |
| grade_class | The classification of students based on their academic grade levels |
| major | Main area of knowledge and skills |
| content | Data to be tested | |



### Dataset Composition
| Type | Count |
|---------------------------------------------------------------------------------------|-------|
| Positive Examples | 100 |
| Negative Examples: <br/>1. ineffectiveness<br/>2. dissimilarity<br/>3. incompleteness | 100 |


## Prompt Introduction
The built-in **PromptTextQualityV3Kaoti** is used as the prompt for this test.<br>
Specific content can be referred to: [Introduction to PromptTextQualityV3Kaoti](../../../dingo/model/prompt/prompt_text_quality_kaoti.py)<br>
The built-in prompt collection can be referred to: [Prompt Collection](../../../dingo/model/prompt)

## Evaluation Results
### Concept Introduction
Both positive and negative examples will generate corresponding summary files after evaluation, so the results need to be defined and the concepts clarified.

| Name | Description |
|-----------|-----------------------------------------------------------------------------|
| TP | True Positive: Number of positive examples evaluated as positive |
| FP | False Positive: Number of negative examples evaluated as positive |
| TN | True Negative: Number of negative examples evaluated as negative |
| FN | False Negative: Number of positive examples evaluated as negative |
| Precision | TP / (TP + FP) Ratio of positive examples among those evaluated as positive |
| Recall | TP / (TP + FN) Ratio of positive examples correctly evaluated as positive |
| F1 | 2 * Accuracy * Recall / (Accuracy + Recall) |

### Result Display
| Dataset Name | TP | FP | TN | FN | Precision% | Recall% | F1 |
|--------------|-----|-----|-----|-----|------------|---------|------|
| redpajama | 86 | 15 | 85 | 14 | 85 | 86 | 0.856|
## Evaluation Method

```python
from dingo.io import InputArgs
from dingo.exec import Executor

input_data = {
"eval_group": "kaoti",
"input_path": "/your/dataset/path",# s3 path :qa-huawei
"save_data": True,
"save_correct": True,
"save_raw": True,
"max_workers": 10,
"batch_size": 10,
"data_format": "jsonl",
"column_content": "content",
"custom_config":
{
"prompt_list": ["PromptTextQualityV3Kaoti"],
"llm_config":
{
"detect_text_quality_detail":
{
"key": "Your Key",
"api_url": "Your Url",
}
}
}
}
input_args = InputArgs(**input_data)
executor = Executor.exec_map["local"](input_args)
result = executor.execute()
print(result)
```
Loading