Skip to content

Commit cbc323e

Browse files
authored
v0.1.5
1. feat: support to crawl shadow root elements 2. feat: update config example and readme 3. feat: support smarter way to detect critical error
2 parents 890dff1 + 8a06188 commit cbc323e

File tree

6 files changed

+193
-60
lines changed

6 files changed

+193
-60
lines changed

README.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,10 @@ test_config: # Test configuration
135135
enabled: True
136136
type: ai # default or ai
137137
business_objectives: example business objectives # Recommended to include test scope, e.g., test search functionality
138+
dynamic_step_generation: # Optional, configuration for dynamic steps generation
139+
enabled: True # Optional, default False, recommended to set True to enable dynamic step generation
140+
max_dynamic_steps: 5 # Optional, default 5 test steps generated per trigger
141+
min_elements_threshold: 2 # Optional, default trigger threshold is 2 DOM element differences
138142
ux_test: # User experience testing
139143
enabled: True
140144
performance_test: # Performance analysis
@@ -170,11 +174,12 @@ UX (User Experience) testing focuses on usability, and user-friendliness. The mo
170174

171175
Based on our testing, these models work well with WebQA Agent:
172176

173-
| Model | Key Strengths | Notes |
174-
|-------|---------------|-------|
175-
| **gpt-4.1-2025-04-14** ⭐ | High accuracy & reliability | **Best choice** |
176-
| **gpt-4.1-mini-2025-04-14** | Cost-effective | **Economical and practical**|
177-
| **doubao-seed-1-6-vision-250815** | Vision capabilities | **Excellent web understanding** |
177+
| Model | Key Strengths | Notes |
178+
|-----------------------------------|-----------------------------|---------------------------------|
179+
| **gpt-4.1-2025-04-14** ⭐ | High accuracy & reliability | **Best choice** |
180+
| **gpt-4.1-mini-2025-04-14** | Cost-effective | **Economical and practical** |
181+
| **qwen3-vl-235b-a22b-instruct** | Open-source, GPT-4.1 level | **Best for on-premise** |
182+
| **doubao-seed-1-6-vision-250815** | Vision capabilities | **Excellent web understanding** |
178183

179184

180185
### View Results

README_zh-CN.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,10 @@ test_config: # 测试项配置
138138
enabled: True
139139
type: ai # default or ai
140140
business_objectives: example business objectives # 建议加入测试范围,如:测试搜索功能
141+
dynamic_step_generation: # 可选,动态生成步骤配置
142+
enabled: True # 可选, 默认False,建议设置为True使能动态步骤生成
143+
max_dynamic_steps: 5 # 可选,默认每次触发生成5步测试步骤
144+
min_elements_threshold: 2 # 可选,默认触发阈值为2个dom元素差异
141145
ux_test: # 用户体验测试
142146
enabled: True
143147
performance_test: # 性能分析
@@ -173,11 +177,12 @@ UX(用户体验)评估关注网页可用性与友好性。结果中的模型
173177

174178
基于实际测试结果,以下模型表现较好,推荐使用:
175179

176-
| 模型 | 核心优势 | 使用建议 |
177-
|------|----------|----------|
178-
| **gpt-4.1-2025-04-14** ⭐ | 高准确性与可靠性 | **最佳选择** |
179-
| **gpt-4.1-mini-2025-04-14** | 性价比高 | **经济实用** |
180-
| **doubao-seed-1-6-vision-250815** | 支持视觉识别 | **网页理解优异** |
180+
| 模型 | 核心优势 | 使用建议 |
181+
|-----------------------------------|--------------|------------|
182+
| **gpt-4.1-2025-04-14** ⭐ | 高准确性与可靠性 | **最佳选择** |
183+
| **gpt-4.1-mini-2025-04-14** | 性价比高 | **经济实用** |
184+
| **qwen3-vl-235b-a22b-instruct** | 媲美gpt-4.1,开源 | **私有部署首选** |
185+
| **doubao-seed-1-6-vision-250815** | 支持视觉识别 | **网页理解优异** |
181186

182187
### 查看结果
183188

config/config.yaml.example

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ test_config: # Test configuration
88
enabled: True
99
type: ai # default or ai
1010
business_objectives: Test Baidu search functionality, generate 3 test cases
11+
dynamic_step_generation:
12+
enabled: True
13+
max_dynamic_steps: 10
14+
min_elements_threshold: 1
1115
ux_test:
1216
enabled: True
1317
performance_test:

webqa_agent/crawler/js/element_detector.js

Lines changed: 56 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -745,6 +745,47 @@
745745
return '/' + parts.join('/');
746746
}
747747

748+
/**
749+
* Retrieves all child container elements from the specified node
750+
*
751+
* This function is used to get child elements when traversing the DOM tree,
752+
* supporting multiple container types:
753+
* - Child elements of regular DOM elements
754+
* - Child elements of Shadow DOM
755+
* - Body element of iframe internal documents
756+
*
757+
* @param {Node|ShadowRoot} node - The node to get child containers from, can be a regular DOM node or Shadow Root
758+
* @returns {Array<Element>} Returns an array containing all child container elements
759+
*/
760+
function getChildContainers(node) {
761+
const out = [];
762+
763+
// Handle Shadow Root nodes
764+
if (node instanceof ShadowRoot) {
765+
out.push(...Array.from(node.children));
766+
}
767+
// Handle regular DOM element nodes
768+
else if (node && node.nodeType === Node.ELEMENT_NODE) {
769+
// Add all direct child elements
770+
out.push(...Array.from(node.children));
771+
772+
// If the element has a Shadow Root, add it to the container list
773+
if (node.shadowRoot instanceof ShadowRoot) out.push(node.shadowRoot);
774+
775+
// Special handling for iframe elements, attempt to get the body of their internal document
776+
if (node.tagName?.toLowerCase() === 'iframe') {
777+
try {
778+
const doc = node.contentDocument;
779+
if (doc?.body) out.push(doc.body);
780+
} catch (_) {
781+
/* Ignore errors when cross-origin iframe access is blocked */
782+
}
783+
}
784+
}
785+
786+
return out;
787+
}
788+
748789
/**
749790
* Gathers comprehensive information about a DOM element.
750791
*
@@ -980,29 +1021,31 @@
9801021
* @returns {object | null} A tree node object, or `null` if the element and its descendants are not relevant.
9811022
*/
9821023
function buildTree(elemObj, wasParentHighlighted = false) {
983-
// 1) get element info
984-
const elemInfo = getElementInfo(elemObj, wasParentHighlighted);
1024+
// If it is a ShadowRoot, use host as element info; otherwise use the element itself
1025+
const infoTarget = (elemObj instanceof ShadowRoot) ? elemObj.host : elemObj;
1026+
1027+
// get element info
1028+
const elemInfo = getElementInfo(infoTarget, wasParentHighlighted);
9851029

986-
// 2) check node satisfies highlight condition
987-
const isCurNodeHighlighted = handleHighlighting(elemInfo, elemObj, wasParentHighlighted)
1030+
// Highlight check
1031+
const isCurNodeHighlighted = elemInfo ? handleHighlighting(elemInfo, infoTarget, wasParentHighlighted) : false;
9881032
const isParentHighlighted = wasParentHighlighted || isCurNodeHighlighted;
9891033

990-
// 3) recursively build structured dom tree, with 'isParentHighlighted' state
1034+
// Recursively process “container” child nodes: Element children, shadowRoot, and same-origin iframe
9911035
const children = [];
992-
Array.from(elemObj.children).forEach(child => {
1036+
for (const child of getChildContainers(elemObj)) {
9931037
const subtree = buildTree(child, isParentHighlighted);
9941038
if (subtree) children.push(subtree);
995-
});
1039+
}
9961040

997-
// 4) highlight filter
1041+
// highlight filter
9981042
if (isCurNodeHighlighted) {
999-
highlightIdMap[elemInfo.highlightIndex] = elemInfo; // map highlightIndex to element info
1000-
return {node: elemInfo, children}; // keep info if is highlightable
1043+
highlightIdMap[elemInfo.highlightIndex] = elemInfo; // map highlightIndex to element info
1044+
return {node: elemInfo, children};
10011045
} else if (children.length > 0) {
1002-
return {node: null, children}; // child node is highlightable
1003-
} else {
1004-
return null; // skip
1046+
return {node: null, children}; // child node is highlightable
10051047
}
1048+
return null;
10061049
}
10071050

10081051
// ============================= Main Function =============================

webqa_agent/testers/case_gen/agents/execute_agent.py

Lines changed: 82 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,51 @@
2121
from webqa_agent.testers.case_gen.utils.message_converter import convert_intermediate_steps_to_messages
2222
from webqa_agent.utils.log_icon import icon
2323

24-
LONG_STEPS = 10
24+
LONG_STEPS = 30
25+
26+
# ============================================================================
27+
# Critical Failure Detection Patterns
28+
# ============================================================================
29+
30+
# Literal patterns for exact substring matching (backward compatible)
31+
CRITICAL_LITERAL_PATTERNS = [
32+
"element not found",
33+
"cannot find",
34+
"page crashed",
35+
"permission denied",
36+
"access denied",
37+
"network timeout",
38+
"browser error",
39+
"navigation failed",
40+
"session expired",
41+
"server error",
42+
"connection timeout",
43+
"unable to load",
44+
"page not accessible",
45+
"critical error",
46+
"missing locator",
47+
"not found in the buffer",
48+
"could not be retrieved",
49+
"failed due to a missing",
50+
"dropdown options could not be retrieved",
51+
]
52+
53+
# Regex patterns for flexible matching
54+
CRITICAL_REGEX_PATTERNS = [
55+
r"not found in\s+.*buffer",
56+
r"failed due to\s+.*missing",
57+
r"locator.*not.*found",
58+
r"element.*not.*available",
59+
r"missing.*for.*action",
60+
r"missing.*parameter",
61+
r"element with id.*not found",
62+
]
63+
64+
# Pre-compile regex for performance
65+
CRITICAL_REGEX = re.compile(
66+
'|'.join(CRITICAL_REGEX_PATTERNS),
67+
re.IGNORECASE
68+
)
2569

2670
# ============================================================================
2771
# Dynamic Step Generation Helper Functions
@@ -106,10 +150,22 @@ def format_elements_for_llm(dom_diff: dict) -> list[dict]:
106150
# Add important attribute information
107151
important_attrs = {}
108152
if attributes:
109-
# Extract important attributes
110-
for key in ['class', 'id', 'role', 'type', 'placeholder', 'aria-label']:
111-
if key in attributes:
112-
important_attrs[key] = attributes[key]
153+
# Define comprehensive attribute whitelist
154+
navigation_attrs = ['href', 'target', 'rel', 'download']
155+
form_attrs = ['type', 'placeholder', 'value', 'name', 'required', 'disabled']
156+
semantic_attrs = ['role', 'aria-label', 'aria-describedby', 'aria-expanded']
157+
158+
for key, value in attributes.items():
159+
# Include whitelisted attributes
160+
if key in ['class', 'id'] + navigation_attrs + form_attrs + semantic_attrs:
161+
important_attrs[key] = value
162+
# Include data-* attributes (often contain behavior info)
163+
elif key.startswith('data-'):
164+
# Limit length to prevent token explosion
165+
important_attrs[key] = value[:200] if isinstance(value, str) and len(value) > 200 else value
166+
# Include style if it indicates visibility/interactivity
167+
elif key == 'style' and isinstance(value, str) and ('display' in value or 'visibility' in value):
168+
important_attrs[key] = value[:200] + "..." if len(value) > 200 else value
113169

114170
if important_attrs:
115171
formatted_elem["attributes"] = important_attrs
@@ -683,24 +739,17 @@ def extract_path(u):
683739
logging.debug(f"Step {i+1} tool output: {tool_output}")
684740
messages.append(AIMessage(content=tool_output))
685741

686-
# Check for failures in the tool output
687-
if "[failure]" in result['intermediate_steps'][0][1].lower() or "failed" in tool_output.lower():
688-
failed_steps.append(i + 1)
689-
logging.warning(f"Step {i+1} detected as failed based on output")
690-
691742
# Check for critical failures that should immediately stop execution
692743
if _is_critical_failure_step(tool_output, instruction_to_execute):
693744
failed_steps.append(i + 1)
694-
final_summary = f"FINAL_SUMMARY: Critical failure at step {i+1}: '{instruction_to_execute}'. Error details: {tool_output[:200]}..."
695-
logging.error(f"Critical failure detected at step {i+1}, aborting remaining steps to save time")
745+
final_summary = f"FINAL_SUMMARY: Critical failure at step {i + 1}: '{instruction_to_execute}'. Error details: {tool_output[:200]}..."
746+
logging.error(f"Critical failure detected at step {i + 1}, aborting remaining steps to save time")
696747
break
697748

698-
# Check for max iterations, which indicates a failure to complete the step.
699-
if "Agent stopped due to max iterations." in tool_output:
749+
# Check for failures in the tool output
750+
if "[failure]" in result['intermediate_steps'][0][1].lower() or "failed" in tool_output.lower():
700751
failed_steps.append(i + 1)
701-
final_summary = f"FINAL_SUMMARY: Step '{instruction_to_execute}' failed after multiple retries. The agent could not complete the instruction. Last output: {tool_output}"
702-
logging.error(f"Step {i+1} failed due to max iterations.")
703-
break
752+
logging.warning(f"Step {i+1} detected as failed based on output")
704753

705754
# Check for objective achievement signal
706755
is_achieved, achievement_reason = _is_objective_achieved(tool_output)
@@ -1018,6 +1067,10 @@ def _is_objective_achieved(tool_output: str) -> tuple[bool, str]:
10181067
def _is_critical_failure_step(tool_output: str, step_instruction: str = "") -> bool:
10191068
"""Check if a single step output indicates a critical failure that should stop execution.
10201069
1070+
Uses hybrid detection approach:
1071+
1. Primary: Structured error tags [CRITICAL_ERROR:category] (preferred)
1072+
2. Fallback: Pattern matching for backward compatibility and enhanced coverage
1073+
10211074
Args:
10221075
tool_output: The output from the step execution
10231076
step_instruction: The instruction that was executed (for context)
@@ -1030,30 +1083,22 @@ def _is_critical_failure_step(tool_output: str, step_instruction: str = "") -> b
10301083

10311084
output_lower = tool_output.lower()
10321085

1033-
# Critical failure patterns for immediate exit
1034-
critical_step_patterns = [
1035-
"element not found",
1036-
"cannot find",
1037-
"page crashed",
1038-
"permission denied",
1039-
"access denied",
1040-
"network timeout",
1041-
"browser error",
1042-
"navigation failed",
1043-
"session expired",
1044-
"server error",
1045-
"connection timeout",
1046-
"unable to load",
1047-
"page not accessible",
1048-
"critical error"
1049-
]
1086+
# Phase 1: Check for structured critical error tags (preferred method)
1087+
if "[critical_error:" in output_lower:
1088+
logging.debug("Critical failure detected via structured error tag")
1089+
return True
10501090

1051-
# Check for critical patterns
1052-
for pattern in critical_step_patterns:
1091+
# Phase 2a: Check literal patterns (backward compatibility)
1092+
for pattern in CRITICAL_LITERAL_PATTERNS:
10531093
if pattern in output_lower:
1054-
logging.debug(f"Critical failure detected in step: pattern '{pattern}' found")
1094+
logging.debug(f"Critical failure detected via literal pattern: '{pattern}'")
10551095
return True
10561096

1097+
# Phase 2b: Check regex patterns (enhanced matching)
1098+
if CRITICAL_REGEX.search(output_lower):
1099+
logging.debug("Critical failure detected via regex pattern")
1100+
return True
1101+
10571102
return False
10581103

10591104

webqa_agent/testers/case_gen/prompts/agent_prompts.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,37 @@ def get_execute_system_prompt(case: dict) -> str:
221221
- Include recovery steps taken for future test improvement
222222
- Maintain clear audit trail of all actions performed
223223
224+
## Structured Error Reporting Protocol
225+
226+
**Critical Rule**: For failures that should immediately stop test execution, you MUST use structured error tags to ensure reliable detection.
227+
228+
### Critical Error Format
229+
When encountering critical failures, include structured tags: **[CRITICAL_ERROR:category]** followed by detailed description.
230+
231+
### Critical Error Categories
232+
- **ELEMENT_NOT_FOUND**: Target element cannot be located, accessed, or interacted with
233+
- **NAVIGATION_FAILED**: Page navigation, loading, or routing failures
234+
- **PERMISSION_DENIED**: Access, authorization, or security restriction issues
235+
- **PAGE_CRASHED**: Browser crashes, page errors, or unrecoverable page states
236+
- **NETWORK_ERROR**: Network connectivity, timeout, or server communication issues
237+
- **SESSION_EXPIRED**: Authentication session, login, or credential issues
238+
239+
### Critical Error Examples
240+
**Element Access Failure**:
241+
`[CRITICAL_ERROR:ELEMENT_NOT_FOUND] The language selector dropdown could not be located in the navigation bar. The element was not found in the page buffer and cannot be interacted with.`
242+
243+
**Navigation Issue**:
244+
`[CRITICAL_ERROR:NAVIGATION_FAILED] Page navigation to the target URL failed due to network timeout. The page is not accessible and the test cannot continue.`
245+
246+
**Permission Issue**:
247+
`[CRITICAL_ERROR:PERMISSION_DENIED] Access to the admin panel was denied. User lacks sufficient privileges to proceed with the test.`
248+
249+
### Non-Critical Failures
250+
Standard failures that allow test continuation should use the regular `[FAILURE]` format without structured tags. These include:
251+
- Validation errors that can be corrected
252+
- Dropdown option mismatches with alternatives available
253+
- Minor UI state changes that don't block core functionality
254+
224255
## Advanced Error Recovery Patterns
225256
226257
### Pattern 1: Form Validation Errors

0 commit comments

Comments
 (0)