Dataset: https://drive.google.com/drive/folders/11GY5JZ4F_h40qsO_PdEAnNisnI-RtSd0?usp=sharing
- Soap opera tests used for the formative study and automated soap opera testing.
- Bug reports for Firefox used to construct the scenario knowledge graph.
- Issues and pull requests for WordPress used to construct the scenario knowledge graph.
- Issues and pull requests for AntennaPod used to construct the scenario knowledge graph.
Code:
- Use scripts/0_0 to 0_3 to crawl bug reports from Mozilla Bugzilla.
- Use scripts/1_1 to 1_3 to crawl issues and pull requests from GitHub.
- Use scripts/2_1 to 2_8 to construct the scenario knowledge graph.
- Set up a virtual Android device or connect a physical Android device to your computer.
- Ensure the application under test is installed on the Android device.
- Run scripts/app.py to execute the multi-agent system for automated soap opera testing.
STEP 0: Go to Subscriptions
STEP 1: Click on the podcast to view the subscription episodes list
STEP 2: Click on the menu icon
STEP 3: Select 'Remove podcast'
STEP 4: Confirm deletion
STEP 5: Go to the player screen
STEP 6: Click on the menu
STEP 7: Select 'Open Podcast'
Soap opera testing is a scenario-based exploratory testing (ET) approach designed to uncover unexpected behaviors through complex workflows and dramatic interactions. Soap opera tests (test scenarios) typically involve realistic yet condensed system usage scenarios, which effectively reveal hidden and unexpected bugs. Without such soap opera tests as guidance, it is challenging for testers or traditional testing techniques to randomly generate and explore such intricate and dramatic workflows from scratch, making it difficult to identify these hidden issues.
In this example, the soap opera test involves removing a subscribed podcast and then attempting to open it on the player screen. This test spans different features (e.g., removing and opening a podcast) and multiple pages (e.g., Subscriptions and Player Screen), reflecting realistic yet condensed system usage scenarios. The exaggerated interactions in this test reveal a hidden issue: an infinitely spinning loader, as shown in the final pink box. This bug only occurs after performing a sequence of intricate steps. Because the scenario involves a realistic yet condensed workflow spanning multiple features and pages, it is challenging for testers or traditional testing techniques to generate and explore such a dramatic scenario from scratch. Moreover, identifying this non-crash bug requires an understanding of the GUI state, adding another layer of difficulty for conventional detection methods.
This example highlights the challenges of automating soap opera testing:
-
Automated Execution Challenge:
Soap opera tests involve long and intricate UI operations, making automation difficult. The Planner’s ability to detect deviations and dynamically adjust the plan (shown in the purple box) significantly increases the likelihood of successful test execution. -
Bug Detection Challenge:
Effective bug detection requires understanding the executed UI instructions (via NLU) and recognizing the resulting UI changes (via image recognition). With these capabilities, the Detector can reliably identify unexpected bugs (shown in the pink box).
STEP 0: Close a tab
STEP 1: Open recently closed tabs
STEP 2: Select a tab to reopen
Round 2 The Detector identifies a bug based on the GUI status and oracle knowledge from SKG.
Round 3 The Planner generates an actionable plan to use the 'UNDO' feature, allowing for easy reopening of the closed tab based on the current GUI status.
Round 4 The Planner generates an actionable plan to open the 'Recently Closed Tabs' page by leveraging the current GUI status and step knowledge from the SKG.
- The Planner generates an actionable plan to reopen the closed tab by selecting the Three-dot menu.
- The Player, however, unexpectedly taps the 'Share' icon instead.
- Recognizing this deviation, the Planner adjusts the plan to cancel the action by clicking the 'X' button.
- This reveals a bug (Figure 3): the 'Back' icon turns black (making it hard to see), and the title reverts to 'Recently Closed Tabs' while still displaying the selected websites after canceling the share action.
- The Planner resumes the intended steps and continues executing the plan until the soap opera test is successfully completed.
From this example, the following insights can be observed:
-
Actionable Plan Generation:
A single step in the soap opera test often requires executing multiple UI instructions. The Planner generates actionable plans based on the current GUI state and step knowledge from the SKG (e.g., round 1, round 4), increasing the likelihood of successfully completing the test. -
Adaptive Planning:
The Planner dynamically adjusts plans based on the GUI state (e.g., round 3). When execution deviates from the intended path, the Planner can correct the deviation and resume the mainline execution (e.g., round 7-11), ensuring the test is completed. -
Non-Intrusive UI Operation Execution:
The Player locates UI elements using grid numbers, a non-intrusive method that enhances the adaptability and generalizability of our approach across different applications. -
Exploration Beyond Test:
Deviations from the main test path can lead to unexpected discoveries. Similar to role-playing games where side quests emerge alongside the main quest, these deviations can reveal hidden bugs (e.g., round 7-9), inspiring further investigation and uncovering additional related issues. -
Continuous Bug Detection:
The Detector performs bug detection after each UI operation, identifying bugs during test execution (e.g., round 2) and upon test completion. By leveraging GUI state understanding and oracle knowledge from the SKG (e.g., round 2, round 6), the Detector can detect various bug types beyond just crashes.