v0.1.5

mmmay0722 · web-flow · commit cbc323e5e771 · 2025-09-28T14:59:20.000+08:00
1. feat: support to crawl shadow root elements
2. feat: update config example and readme
3. feat: support smarter way to detect critical error
diff --git a/README.md b/README.md
@@ -135,6 +135,10 @@ test_config:                                      # Test configuration
     enabled: True
     type: ai                                      # default or ai
     business_objectives: example business objectives  # Recommended to include test scope, e.g., test search functionality
+    dynamic_step_generation:                      # Optional, configuration for dynamic steps generation
+      enabled: True                               # Optional, default False, recommended to set True to enable dynamic step generation
+      max_dynamic_steps: 5                        # Optional, default 5 test steps generated per trigger
+      min_elements_threshold: 2                   # Optional, default trigger threshold is 2 DOM element differences
   ux_test:                                        # User experience testing
     enabled: True
   performance_test:                               # Performance analysis
@@ -170,11 +174,12 @@ UX (User Experience) testing focuses on usability, and user-friendliness. The mo
 
 Based on our testing, these models work well with WebQA Agent:
 
-| Model | Key Strengths | Notes |
-|-------|---------------|-------|
-| **gpt-4.1-2025-04-14** ⭐ | High accuracy & reliability | **Best choice** |
-| **gpt-4.1-mini-2025-04-14** | Cost-effective | **Economical and practical**|
-| **doubao-seed-1-6-vision-250815** | Vision capabilities | **Excellent web understanding** |
+| Model                             | Key Strengths               | Notes                           |
+|-----------------------------------|-----------------------------|---------------------------------|
+| **gpt-4.1-2025-04-14** ⭐         | High accuracy & reliability | **Best choice**                 |
+| **gpt-4.1-mini-2025-04-14**       | Cost-effective              | **Economical and practical**    |
+| **qwen3-vl-235b-a22b-instruct**   | Open-source, GPT-4.1 level  | **Best for on-premise**         |
+| **doubao-seed-1-6-vision-250815** | Vision capabilities         | **Excellent web understanding** |
 
 
 ### View Results
diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -138,6 +138,10 @@ test_config:                                      # 测试项配置
     enabled: True
     type: ai                                      # default or ai
     business_objectives: example business objectives  # 建议加入测试范围，如：测试搜索功能
+    dynamic_step_generation:                      # 可选，动态生成步骤配置
+      enabled: True                               # 可选, 默认False，建议设置为True使能动态步骤生成
+      max_dynamic_steps: 5                        # 可选，默认每次触发生成5步测试步骤
+      min_elements_threshold: 2                   # 可选，默认触发阈值为2个dom元素差异
   ux_test:                                        # 用户体验测试
     enabled: True
   performance_test:                               # 性能分析
@@ -173,11 +177,12 @@ UX（用户体验）评估关注网页可用性与友好性。结果中的模型
 
 基于实际测试结果，以下模型表现较好，推荐使用：
 
-| 模型 | 核心优势 | 使用建议 |
-|------|----------|----------|
-| **gpt-4.1-2025-04-14** ⭐ | 高准确性与可靠性 | **最佳选择** |
-| **gpt-4.1-mini-2025-04-14** | 性价比高 | **经济实用** |
-| **doubao-seed-1-6-vision-250815** | 支持视觉识别 | **网页理解优异** |
+| 模型                                | 核心优势         | 使用建议       |
+|-----------------------------------|--------------|------------|
+| **gpt-4.1-2025-04-14** ⭐          | 高准确性与可靠性     | **最佳选择**   |
+| **gpt-4.1-mini-2025-04-14**       | 性价比高         | **经济实用**   |
+| **qwen3-vl-235b-a22b-instruct**   | 媲美gpt-4.1，开源 | **私有部署首选** |
+| **doubao-seed-1-6-vision-250815** | 支持视觉识别       | **网页理解优异** |
 
 ### 查看结果
 
diff --git a/config/config.yaml.example b/config/config.yaml.example
@@ -8,6 +8,10 @@ test_config: # Test configuration
     enabled: True
     type: ai  # default or ai
     business_objectives: Test Baidu search functionality, generate 3 test cases
+    dynamic_step_generation:
+      enabled: True
+      max_dynamic_steps: 10
+      min_elements_threshold: 1
   ux_test:
     enabled: True
   performance_test:
diff --git a/webqa_agent/crawler/js/element_detector.js b/webqa_agent/crawler/js/element_detector.js
@@ -745,6 +745,47 @@
             return '/' + parts.join('/');
         }
 
+         /**
+         * Retrieves all child container elements from the specified node
+         *
+         * This function is used to get child elements when traversing the DOM tree,
+         * supporting multiple container types:
+         * - Child elements of regular DOM elements
+         * - Child elements of Shadow DOM
+         * - Body element of iframe internal documents
+         *
+         * @param {Node|ShadowRoot} node - The node to get child containers from, can be a regular DOM node or Shadow Root
+         * @returns {Array<Element>} Returns an array containing all child container elements
+         */
+        function getChildContainers(node) {
+            const out = [];
+
+            // Handle Shadow Root nodes
+            if (node instanceof ShadowRoot) {
+                out.push(...Array.from(node.children));
+            }
+            // Handle regular DOM element nodes
+            else if (node && node.nodeType === Node.ELEMENT_NODE) {
+                // Add all direct child elements
+                out.push(...Array.from(node.children));
+
+                // If the element has a Shadow Root, add it to the container list
+                if (node.shadowRoot instanceof ShadowRoot) out.push(node.shadowRoot);
+
+                // Special handling for iframe elements, attempt to get the body of their internal document
+                if (node.tagName?.toLowerCase() === 'iframe') {
+                    try {
+                        const doc = node.contentDocument;
+                        if (doc?.body) out.push(doc.body);
+                    } catch (_) {
+                        /* Ignore errors when cross-origin iframe access is blocked */
+                    }
+                }
+            }
+
+            return out;
+        }
+
         /**
          * Gathers comprehensive information about a DOM element.
          *
@@ -980,29 +1021,31 @@
          * @returns {object | null} A tree node object, or `null` if the element and its descendants are not relevant.
          */
         function buildTree(elemObj, wasParentHighlighted = false) {
-            // 1) get element info
-            const elemInfo = getElementInfo(elemObj, wasParentHighlighted);
+            // If it is a ShadowRoot, use host as element info; otherwise use the element itself
+            const infoTarget = (elemObj instanceof ShadowRoot) ? elemObj.host : elemObj;
+
+            // get element info
+            const elemInfo = getElementInfo(infoTarget, wasParentHighlighted);
 
-            // 2) check node satisfies highlight condition
-            const isCurNodeHighlighted = handleHighlighting(elemInfo, elemObj, wasParentHighlighted)
+            // Highlight check
+            const isCurNodeHighlighted = elemInfo ? handleHighlighting(elemInfo, infoTarget, wasParentHighlighted) : false;
             const isParentHighlighted = wasParentHighlighted || isCurNodeHighlighted;
 
-            // 3) recursively build structured dom tree, with 'isParentHighlighted' state
+            // Recursively process “container” child nodes: Element children, shadowRoot, and same-origin iframe
             const children = [];
-            Array.from(elemObj.children).forEach(child => {
+            for (const child of getChildContainers(elemObj)) {
                 const subtree = buildTree(child, isParentHighlighted);
                 if (subtree) children.push(subtree);
-            });
+            }
 
-            // 4) highlight filter
+            // highlight filter
             if (isCurNodeHighlighted) {
-                highlightIdMap[elemInfo.highlightIndex] = elemInfo;     // map highlightIndex to element info
-                return {node: elemInfo, children};                      // keep info if is highlightable
+                highlightIdMap[elemInfo.highlightIndex] = elemInfo;  // map highlightIndex to element info
+                return {node: elemInfo, children};
             } else if (children.length > 0) {
-                return {node: null, children};                          // child node is highlightable
-            } else {
-                return null;                                            // skip
+                return {node: null, children};                       // child node is highlightable
             }
+            return null;
         }
 
         // ============================= Main Function =============================
diff --git a/webqa_agent/testers/case_gen/agents/execute_agent.py b/webqa_agent/testers/case_gen/agents/execute_agent.py
@@ -21,7 +21,51 @@
 from webqa_agent.testers.case_gen.utils.message_converter import convert_intermediate_steps_to_messages
 from webqa_agent.utils.log_icon import icon
 
-LONG_STEPS = 10
+LONG_STEPS = 30
+
+# ============================================================================
+# Critical Failure Detection Patterns
+# ============================================================================
+
+# Literal patterns for exact substring matching (backward compatible)
+CRITICAL_LITERAL_PATTERNS = [
+    "element not found",
+    "cannot find", 
+    "page crashed",
+    "permission denied",
+    "access denied",
+    "network timeout",
+    "browser error",
+    "navigation failed",
+    "session expired",
+    "server error",
+    "connection timeout",
+    "unable to load",
+    "page not accessible",
+    "critical error",
+    "missing locator",
+    "not found in the buffer",
+    "could not be retrieved",
+    "failed due to a missing",
+    "dropdown options could not be retrieved",
+]
+
+# Regex patterns for flexible matching
+CRITICAL_REGEX_PATTERNS = [
+    r"not found in\s+.*buffer",
+    r"failed due to\s+.*missing",
+    r"locator.*not.*found",
+    r"element.*not.*available", 
+    r"missing.*for.*action",
+    r"missing.*parameter",
+    r"element with id.*not found",
+]
+
+# Pre-compile regex for performance
+CRITICAL_REGEX = re.compile(
+    '|'.join(CRITICAL_REGEX_PATTERNS),
+    re.IGNORECASE
+)
 
 # ============================================================================
 # Dynamic Step Generation Helper Functions
@@ -106,10 +150,22 @@ def format_elements_for_llm(dom_diff: dict) -> list[dict]:
         # Add important attribute information
         important_attrs = {}
         if attributes:
-            # Extract important attributes
-            for key in ['class', 'id', 'role', 'type', 'placeholder', 'aria-label']:
-                if key in attributes:
-                    important_attrs[key] = attributes[key]
+            # Define comprehensive attribute whitelist
+            navigation_attrs = ['href', 'target', 'rel', 'download']
+            form_attrs = ['type', 'placeholder', 'value', 'name', 'required', 'disabled']
+            semantic_attrs = ['role', 'aria-label', 'aria-describedby', 'aria-expanded']
+            
+            for key, value in attributes.items():
+                # Include whitelisted attributes
+                if key in ['class', 'id'] + navigation_attrs + form_attrs + semantic_attrs:
+                    important_attrs[key] = value
+                # Include data-* attributes (often contain behavior info)
+                elif key.startswith('data-'):
+                    # Limit length to prevent token explosion
+                    important_attrs[key] = value[:200] if isinstance(value, str) and len(value) > 200 else value
+                # Include style if it indicates visibility/interactivity
+                elif key == 'style' and isinstance(value, str) and ('display' in value or 'visibility' in value):
+                    important_attrs[key] = value[:200] + "..." if len(value) > 200 else value
         
         if important_attrs:
             formatted_elem["attributes"] = important_attrs
@@ -683,24 +739,17 @@ def extract_path(u):
             logging.debug(f"Step {i+1} tool output: {tool_output}")
             messages.append(AIMessage(content=tool_output))
 
-            # Check for failures in the tool output
-            if "[failure]" in result['intermediate_steps'][0][1].lower() or "failed" in tool_output.lower():
-                failed_steps.append(i + 1)
-                logging.warning(f"Step {i+1} detected as failed based on output")
-
             # Check for critical failures that should immediately stop execution
             if _is_critical_failure_step(tool_output, instruction_to_execute):
                 failed_steps.append(i + 1)
-                final_summary = f"FINAL_SUMMARY: Critical failure at step {i+1}: '{instruction_to_execute}'. Error details: {tool_output[:200]}..."
-                logging.error(f"Critical failure detected at step {i+1}, aborting remaining steps to save time")
+                final_summary = f"FINAL_SUMMARY: Critical failure at step {i + 1}: '{instruction_to_execute}'. Error details: {tool_output[:200]}..."
+                logging.error(f"Critical failure detected at step {i + 1}, aborting remaining steps to save time")
                 break
 
-            # Check for max iterations, which indicates a failure to complete the step.
-            if "Agent stopped due to max iterations." in tool_output:
+            # Check for failures in the tool output
+            if "[failure]" in result['intermediate_steps'][0][1].lower() or "failed" in tool_output.lower():
                 failed_steps.append(i + 1)
-                final_summary = f"FINAL_SUMMARY: Step '{instruction_to_execute}' failed after multiple retries. The agent could not complete the instruction. Last output: {tool_output}"
-                logging.error(f"Step {i+1} failed due to max iterations.")
-                break
+                logging.warning(f"Step {i+1} detected as failed based on output")
 
             # Check for objective achievement signal
             is_achieved, achievement_reason = _is_objective_achieved(tool_output)
@@ -1018,6 +1067,10 @@ def _is_objective_achieved(tool_output: str) -> tuple[bool, str]:
 def _is_critical_failure_step(tool_output: str, step_instruction: str = "") -> bool:
     """Check if a single step output indicates a critical failure that should stop execution.
     
+    Uses hybrid detection approach:
+    1. Primary: Structured error tags [CRITICAL_ERROR:category] (preferred)
+    2. Fallback: Pattern matching for backward compatibility and enhanced coverage
+    
     Args:
         tool_output: The output from the step execution
         step_instruction: The instruction that was executed (for context)
@@ -1030,30 +1083,22 @@ def _is_critical_failure_step(tool_output: str, step_instruction: str = "") -> b
     
     output_lower = tool_output.lower()
     
-    # Critical failure patterns for immediate exit
-    critical_step_patterns = [
-        "element not found",
-        "cannot find",
-        "page crashed", 
-        "permission denied",
-        "access denied",
-        "network timeout",
-        "browser error",
-        "navigation failed",
-        "session expired",
-        "server error", 
-        "connection timeout",
-        "unable to load",
-        "page not accessible",
-        "critical error"
-    ]
+    # Phase 1: Check for structured critical error tags (preferred method)
+    if "[critical_error:" in output_lower:
+        logging.debug("Critical failure detected via structured error tag")
+        return True
     
-    # Check for critical patterns
-    for pattern in critical_step_patterns:
+    # Phase 2a: Check literal patterns (backward compatibility)
+    for pattern in CRITICAL_LITERAL_PATTERNS:
         if pattern in output_lower:
-            logging.debug(f"Critical failure detected in step: pattern '{pattern}' found")
+            logging.debug(f"Critical failure detected via literal pattern: '{pattern}'")
             return True
     
+    # Phase 2b: Check regex patterns (enhanced matching)
+    if CRITICAL_REGEX.search(output_lower):
+        logging.debug("Critical failure detected via regex pattern")
+        return True
+    
     return False
 
 
diff --git a/webqa_agent/testers/case_gen/prompts/agent_prompts.py b/webqa_agent/testers/case_gen/prompts/agent_prompts.py
@@ -221,6 +221,37 @@ def get_execute_system_prompt(case: dict) -> str:
 - Include recovery steps taken for future test improvement
 - Maintain clear audit trail of all actions performed
 
+## Structured Error Reporting Protocol
+
+**Critical Rule**: For failures that should immediately stop test execution, you MUST use structured error tags to ensure reliable detection.
+
+### Critical Error Format
+When encountering critical failures, include structured tags: **[CRITICAL_ERROR:category]** followed by detailed description.
+
+### Critical Error Categories
+- **ELEMENT_NOT_FOUND**: Target element cannot be located, accessed, or interacted with
+- **NAVIGATION_FAILED**: Page navigation, loading, or routing failures  
+- **PERMISSION_DENIED**: Access, authorization, or security restriction issues
+- **PAGE_CRASHED**: Browser crashes, page errors, or unrecoverable page states
+- **NETWORK_ERROR**: Network connectivity, timeout, or server communication issues
+- **SESSION_EXPIRED**: Authentication session, login, or credential issues
+
+### Critical Error Examples
+**Element Access Failure**:
+`[CRITICAL_ERROR:ELEMENT_NOT_FOUND] The language selector dropdown could not be located in the navigation bar. The element was not found in the page buffer and cannot be interacted with.`
+
+**Navigation Issue**:
+`[CRITICAL_ERROR:NAVIGATION_FAILED] Page navigation to the target URL failed due to network timeout. The page is not accessible and the test cannot continue.`
+
+**Permission Issue**:
+`[CRITICAL_ERROR:PERMISSION_DENIED] Access to the admin panel was denied. User lacks sufficient privileges to proceed with the test.`
+
+### Non-Critical Failures
+Standard failures that allow test continuation should use the regular `[FAILURE]` format without structured tags. These include:
+- Validation errors that can be corrected
+- Dropdown option mismatches with alternatives available
+- Minor UI state changes that don't block core functionality
+
 ## Advanced Error Recovery Patterns
 
 ### Pattern 1: Form Validation Errors