Skip to content

Conversation

@heywji
Copy link
Contributor

@heywji heywji commented Oct 29, 2025

Increase dump timeout from 90s to 1800s (30 minutes) to handle large vmcore files (~132GB). The original timeout was too short for VMs with 126GB memory, causing truncated vmcore files.

With 30-minute timeout, even at 75MB/s disk I/O speed, 132GB dumps should complete successfully.

ID: 4239

Signed-off-by: Wenkang Ji [email protected]

Summary by CodeRabbit

  • Chores
    • Increased timeout thresholds for guest memory dump testing to improve reliability of long-running operations: dump file handling, debug command execution, crash verification, and vmcore detection timeouts extended (to ~30 minutes) to reduce flakiness and handle slower environments.

@coderabbitai
Copy link

coderabbitai bot commented Oct 29, 2025

Walkthrough

Timeouts for guest memory/core dump operations were increased to 1800 seconds in config and test code. Changes: dump_file_timeout in qemu/tests/cfg/dump_guest_memory.cfg 90 → 1800; gdb command timeout in qemu/tests/dump_guest_core.py 360 → 1800; crash command timeout in qemu/tests/dump_guest_core.py 60 → 1800; wait_for timeout checking vmcore in qemu/tests/dump_guest_core.py 60 → 1800; process.getstatusoutput(...) in qemu/tests/dump_guest_memory.py now uses timeout=1800. Exception handling in dump_guest_core.py was changed to catch the builtin TimeoutError alongside process.CmdError (replacing a prior process.TimeoutError reference).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Homogeneous changes (timeout value increases) with a small API/exception-handling tweak.
  • Review focus:
    • Verify all timeout values and units (seconds) are intentional and consistent.
    • Confirm the exception change (use of builtin TimeoutError and process.CmdError) matches the runtime exceptions from invoked APIs.
    • Check process.getstatusoutput(..., timeout=1800) API compatibility and behavior on timeout.
    • Confirm extended timeouts do not conflict with CI/test runtime or resource constraints.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main change: increasing timeout for large vmcore dumps, which is supported by all three modified files (dump_guest_memory.cfg, dump_guest_core.py, and dump_guest_memory.py).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d91978b and d4d5e01.

📒 Files selected for processing (3)
  • qemu/tests/cfg/dump_guest_memory.cfg (1 hunks)
  • qemu/tests/dump_guest_core.py (4 hunks)
  • qemu/tests/dump_guest_memory.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • qemu/tests/cfg/dump_guest_memory.cfg
  • qemu/tests/dump_guest_core.py
  • qemu/tests/dump_guest_memory.py

Comment @coderabbitai help to get the list of available commands and usage tips.

@heywji
Copy link
Contributor Author

heywji commented Oct 29, 2025

Test Results: PASS

FROM:

(01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (399.91 s)                                                                    
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (402.98 s)                                                                    
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (404.10 s)                                                                    
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (400.25 s)                                                                    
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (402.54 s)                                                                    
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (403.41 s)                                                                    
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (401.73 s)
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED              
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (401.28 s)
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED  
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (401.53 s)
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED             
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (404.02 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 9 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0                                 

TO:

 (01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (423.51 s)
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (414.12 s)
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (425.09 s)
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (412.60 s)
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (433.74 s)
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (410.03 s)
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (416.82 s)
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (420.95 s)
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (420.31 s)
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (437.95 s)
RESULTS    : PASS 10 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
qemu/tests/dump_guest_core.py (1)

65-65: Inconsistent timeout: crash command should also be increased.

The crash command timeout remains at 60 seconds, but dump_guest_memory.py line 160 increased it to 1800 seconds for the same operation. Large vmcore files (132 GB) will likely require more than 60 seconds for crash to analyze. This timeout should be increased to 1800 for consistency.

Apply this diff to fix the inconsistency:

-        output = process.getoutput(crash_cmd, timeout=60)
+        output = process.getoutput(crash_cmd, timeout=1800)
🧹 Nitpick comments (1)
qemu/tests/cfg/dump_guest_memory.cfg (1)

7-7: Consider a larger safety margin for the timeout value.

The timeout increase to 1800 seconds is appropriate for large vmcore dumps. However, based on your PR description (132 GB at 75 MB/s ≈ 1802 seconds), this provides minimal buffer. Consider increasing to 2400 or 3600 seconds to accommodate slower I/O speeds, system load, or compression overhead.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1eb1d9a and b4f6cbd.

📒 Files selected for processing (3)
  • qemu/tests/cfg/dump_guest_memory.cfg (1 hunks)
  • qemu/tests/dump_guest_core.py (2 hunks)
  • qemu/tests/dump_guest_memory.py (1 hunks)
🔇 Additional comments (3)
qemu/tests/dump_guest_core.py (2)

48-48: LGTM! Appropriate timeout increase for gdb dump generation.

The timeout increase from 360 to 1800 seconds is appropriate for generating large vmcore files via gdb.


116-116: LGTM! Defensive timeout increase.

The timeout increase to 1800 seconds is safe and consistent. Note that this wait might be redundant since check_core_file() (line 113) should have already created the vmcore_file synchronously via the gdb command at line 48. However, the defensive check with adequate timeout is reasonable.

qemu/tests/dump_guest_memory.py (1)

160-160: LGTM! Good addition of timeout to prevent indefinite hangs.

Adding the 1800-second timeout to the crash verification command is appropriate and prevents potential indefinite hangs when analyzing large vmcore files.

@heywji
Copy link
Contributor Author

heywji commented Oct 30, 2025

Depends on: #4390

@heywji heywji force-pushed the fix_dump_guest_memory_increase_timeout branch 2 times, most recently from 9ad7310 to d91978b Compare November 4, 2025 06:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
qemu/tests/dump_guest_core.py (1)

48-116: Consider making timeouts configurable.

The 1800s timeout is now hardcoded in multiple locations. For large values like 30 minutes, consider making timeouts configurable via test parameters to provide flexibility for different hardware configurations or memory sizes.

For example, you could add to the params section:

gdb_timeout = int(params.get("gdb_timeout", 1800))
crash_timeout = int(params.get("crash_timeout", 1800))
vmcore_wait_timeout = int(params.get("vmcore_wait_timeout", 1800))

Then use these variables in the respective locations instead of hardcoded values.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ad7310 and d91978b.

📒 Files selected for processing (3)
  • qemu/tests/cfg/dump_guest_memory.cfg (1 hunks)
  • qemu/tests/dump_guest_core.py (4 hunks)
  • qemu/tests/dump_guest_memory.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • qemu/tests/dump_guest_memory.py
  • qemu/tests/cfg/dump_guest_memory.cfg
🔇 Additional comments (3)
qemu/tests/dump_guest_core.py (3)

48-48: Timeout increase is appropriate for large memory dumps.

The increase from 360s to 1800s is well-justified by the test results showing operations take 400-438 seconds for 126 GB memory VMs. This aligns with the PR objective to handle ~132 GB vmcore files.


65-65: Timeout increase is consistent and necessary.

The increase from 60s to 1800s is necessary for analyzing large vmcore files (~132 GB) and maintains consistency with the other timeout changes in this test.


116-116: Timeout increase is appropriate and consistent.

The increase from 60s to 1800s for waiting for the vmcore file to exist is necessary for large vmcore files and maintains consistency with the other timeout changes throughout this test.

Remove process.TimeoutError from exception handling, as it doesn't
exist in avocado.utils.process module.

Signed-off-by: wji <[email protected]>
Increase dump timeout from 90s to 1800s (30 minutes) to handle large
vmcore files (~132GB). The original timeout was too short for VMs with
126GB memory, causing truncated vmcore files.
With 30-minute timeout, even at 75MB/s disk I/O speed, 132GB dumps
should complete successfully.

Signed-off-by: wji <[email protected]>
@heywji heywji force-pushed the fix_dump_guest_memory_increase_timeout branch from d91978b to d4d5e01 Compare November 4, 2025 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant