dump_guest_memory: Increase timeout for large vmcore dumps #4391

heywji · 2025-10-29T08:18:01Z

Increase dump timeout from 90s to 1800s (30 minutes) to handle large vmcore files (~132GB). The original timeout was too short for VMs with 126GB memory, causing truncated vmcore files.

With 30-minute timeout, even at 75MB/s disk I/O speed, 132GB dumps should complete successfully.

ID: 4239

Signed-off-by: Wenkang Ji [email protected]

Summary by CodeRabbit

Chores
- Increased timeout thresholds for guest memory dump testing to improve reliability of long-running operations: dump file handling, debug command execution, crash verification, and vmcore detection timeouts extended (to ~30 minutes) to reduce flakiness and handle slower environments.

coderabbitai · 2025-10-29T08:18:50Z

Walkthrough

Timeouts for guest memory/core dump operations were increased to 1800 seconds in config and test code. Changes: dump_file_timeout in qemu/tests/cfg/dump_guest_memory.cfg 90 → 1800; gdb command timeout in qemu/tests/dump_guest_core.py 360 → 1800; crash command timeout in qemu/tests/dump_guest_core.py 60 → 1800; wait_for timeout checking vmcore in qemu/tests/dump_guest_core.py 60 → 1800; process.getstatusoutput(...) in qemu/tests/dump_guest_memory.py now uses timeout=1800. Exception handling in dump_guest_core.py was changed to catch the builtin TimeoutError alongside process.CmdError (replacing a prior process.TimeoutError reference).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Homogeneous changes (timeout value increases) with a small API/exception-handling tweak.
Review focus:
- Verify all timeout values and units (seconds) are intentional and consistent.
- Confirm the exception change (use of builtin TimeoutError and process.CmdError) matches the runtime exceptions from invoked APIs.
- Check process.getstatusoutput(..., timeout=1800) API compatibility and behavior on timeout.
- Confirm extended timeouts do not conflict with CI/test runtime or resource constraints.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly summarizes the main change: increasing timeout for large vmcore dumps, which is supported by all three modified files (dump_guest_memory.cfg, dump_guest_core.py, and dump_guest_memory.py).

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d91978b and d4d5e01.

📒 Files selected for processing (3)

qemu/tests/cfg/dump_guest_memory.cfg (1 hunks)
qemu/tests/dump_guest_core.py (4 hunks)
qemu/tests/dump_guest_memory.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

qemu/tests/cfg/dump_guest_memory.cfg
qemu/tests/dump_guest_core.py
qemu/tests/dump_guest_memory.py

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

heywji · 2025-10-29T08:20:00Z

Test Results: PASS

FROM:

(01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (399.91 s)                                                                    
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (402.98 s)                                                                    
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (404.10 s)                                                                    
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (400.25 s)                                                                    
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (402.54 s)                                                                    
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (403.41 s)                                                                    
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (401.73 s)
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED              
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (401.28 s)
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED  
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (401.53 s)
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED             
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (404.02 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 9 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

TO:

 (01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (423.51 s)
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (414.12 s)
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (425.09 s)
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (412.60 s)
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (433.74 s)
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (410.03 s)
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (416.82 s)
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (420.95 s)
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (420.31 s)
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (437.95 s)
RESULTS    : PASS 10 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

qemu/tests/dump_guest_core.py (1)
65-65: Inconsistent timeout: crash command should also be increased.

The crash command timeout remains at 60 seconds, but dump_guest_memory.py line 160 increased it to 1800 seconds for the same operation. Large vmcore files (132 GB) will likely require more than 60 seconds for crash to analyze. This timeout should be increased to 1800 for consistency.

Apply this diff to fix the inconsistency:
-        output = process.getoutput(crash_cmd, timeout=60)
+        output = process.getoutput(crash_cmd, timeout=1800)

🧹 Nitpick comments (1)

qemu/tests/cfg/dump_guest_memory.cfg (1)

7-7: Consider a larger safety margin for the timeout value.

The timeout increase to 1800 seconds is appropriate for large vmcore dumps. However, based on your PR description (132 GB at 75 MB/s ≈ 1802 seconds), this provides minimal buffer. Consider increasing to 2400 or 3600 seconds to accommodate slower I/O speeds, system load, or compression overhead.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1eb1d9a and b4f6cbd.

📒 Files selected for processing (3)

qemu/tests/cfg/dump_guest_memory.cfg (1 hunks)
qemu/tests/dump_guest_core.py (2 hunks)
qemu/tests/dump_guest_memory.py (1 hunks)

🔇 Additional comments (3)

qemu/tests/dump_guest_core.py (2)

48-48: LGTM! Appropriate timeout increase for gdb dump generation.

The timeout increase from 360 to 1800 seconds is appropriate for generating large vmcore files via gdb.

116-116: LGTM! Defensive timeout increase.

The timeout increase to 1800 seconds is safe and consistent. Note that this wait might be redundant since check_core_file() (line 113) should have already created the vmcore_file synchronously via the gdb command at line 48. However, the defensive check with adequate timeout is reasonable.

qemu/tests/dump_guest_memory.py (1)

160-160: LGTM! Good addition of timeout to prevent indefinite hangs.

Adding the 1800-second timeout to the crash verification command is appropriate and prevents potential indefinite hangs when analyzing large vmcore files.

heywji · 2025-10-30T09:26:09Z

Depends on: #4390

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

qemu/tests/dump_guest_core.py (1)
48-116: Consider making timeouts configurable.

The 1800s timeout is now hardcoded in multiple locations. For large values like 30 minutes, consider making timeouts configurable via test parameters to provide flexibility for different hardware configurations or memory sizes.

For example, you could add to the params section:
gdb_timeout = int(params.get("gdb_timeout", 1800))
crash_timeout = int(params.get("crash_timeout", 1800))
vmcore_wait_timeout = int(params.get("vmcore_wait_timeout", 1800))
Then use these variables in the respective locations instead of hardcoded values.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ad7310 and d91978b.

📒 Files selected for processing (3)

qemu/tests/cfg/dump_guest_memory.cfg (1 hunks)
qemu/tests/dump_guest_core.py (4 hunks)
qemu/tests/dump_guest_memory.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

qemu/tests/dump_guest_memory.py
qemu/tests/cfg/dump_guest_memory.cfg

🔇 Additional comments (3)

qemu/tests/dump_guest_core.py (3)

48-48: Timeout increase is appropriate for large memory dumps.

The increase from 360s to 1800s is well-justified by the test results showing operations take 400-438 seconds for 126 GB memory VMs. This aligns with the PR objective to handle ~132 GB vmcore files.

65-65: Timeout increase is consistent and necessary.

The increase from 60s to 1800s is necessary for analyzing large vmcore files (~132 GB) and maintains consistency with the other timeout changes in this test.

116-116: Timeout increase is appropriate and consistent.

The increase from 60s to 1800s for waiting for the vmcore file to exist is necessary for large vmcore files and maintains consistency with the other timeout changes throughout this test.

qemu/tests/dump_guest_core.py

Remove process.TimeoutError from exception handling, as it doesn't exist in avocado.utils.process module. Signed-off-by: wji <[email protected]>

Increase dump timeout from 90s to 1800s (30 minutes) to handle large vmcore files (~132GB). The original timeout was too short for VMs with 126GB memory, causing truncated vmcore files. With 30-minute timeout, even at 75MB/s disk I/O speed, 132GB dumps should complete successfully. Signed-off-by: wji <[email protected]>

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

PaulYuuu mentioned this pull request Nov 3, 2025

dump_guest_core: Fix AttributeError in dump_guest_core test #4390

Closed

heywji force-pushed the fix_dump_guest_memory_increase_timeout branch 2 times, most recently from 9ad7310 to d91978b Compare November 4, 2025 06:40

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

qemu/tests/dump_guest_core.py Outdated Show resolved Hide resolved

heywji added 2 commits November 4, 2025 14:47

dump_guest_core: Fix AttributeError in dump_guest_core test

f98dd2e

Remove process.TimeoutError from exception handling, as it doesn't exist in avocado.utils.process module. Signed-off-by: wji <[email protected]>

heywji force-pushed the fix_dump_guest_memory_increase_timeout branch from d91978b to d4d5e01 Compare November 4, 2025 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dump_guest_memory: Increase timeout for large vmcore dumps #4391

dump_guest_memory: Increase timeout for large vmcore dumps #4391

Uh oh!

heywji commented Oct 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

heywji commented Oct 29, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

heywji commented Oct 30, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dump_guest_memory: Increase timeout for large vmcore dumps #4391

Are you sure you want to change the base?

dump_guest_memory: Increase timeout for large vmcore dumps #4391

Uh oh!

Conversation

heywji commented Oct 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

heywji commented Oct 29, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

heywji commented Oct 30, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

heywji commented Oct 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 29, 2025 •

edited

Loading