Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hugepage_reset: Test compatible with different NUMA topologies #4237

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mcasquer
Copy link
Contributor

@mcasquer mcasquer commented Dec 18, 2024

hugepage_reset: Test compatible with different NUMA topologies

As the test will set 8 hugepages, this works fine
for systems with 2 NUMA nodes, having e.g. 8 nodes
is going to lead the on_numa_node variant to fail
since the binded node doesn't have enough hugepages.

As the cfg already suggests to allocate 1G hugepages
on boot time, let's make user decision how many hugepages
allocate, adding an informative comment in the cfg as well.

Finally, if system hugepage_size is 1GB, allocates at
runtime enough hugepages in all valid nodes.

Signed-off-by: mcasquer [email protected]
ID: 3254

@mcasquer mcasquer marked this pull request as ready for review December 18, 2024 10:17
@mcasquer
Copy link
Contributor Author

Please @JinLiul could you test this PR whenever you have a 8 NUMA nodes system again? Thanks!

@JinLiul
Copy link
Contributor

JinLiul commented Dec 30, 2024

Hi @mcasquer, tested with 8 NUMA nodes system.
kar command: python3 ConfigTest.py --category=huge_page_1G --guestname=RHEL.10.0 --debug
Still failed with qemu-kvm: unable to map backing store for guest RAM: Cannot allocate memory.

@mcasquer mcasquer force-pushed the 3254_hp_reset_setup branch from 5e1c2c5 to b1013a9 Compare January 15, 2025 12:08
@mcasquer mcasquer changed the title hugepage_reset: removes hugepages setup hugepage_reset: Test compatible with different NUMA topologies Jan 15, 2025
@mcasquer
Copy link
Contributor Author

Tests results on a 8 NUMA nodes host (with test loop a bit tuned 😁)

 (1/3) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.0.x86_64.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: STARTED
 (1/3) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.0.x86_64.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: PASS (598.44 s)
 (2/3) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.0.x86_64.io-github-autotest-qemu.hugepage_reset.q35: STARTED
 (2/3) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.0.x86_64.io-github-autotest-qemu.hugepage_reset.q35: PASS (142.54 s)
 (3/3) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.0.x86_64.io-github-autotest-qemu.hugepage_reset.on_numa_node.q35: STARTED
 (3/3) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.0.x86_64.io-github-autotest-qemu.hugepage_reset.on_numa_node.q35: PASS (146.94 s)
RESULTS    : PASS 3 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

@mcasquer
Copy link
Contributor Author

Please @JinLiul could you test again this PR? Thanks !

@mcasquer
Copy link
Contributor Author

Also passed in 2 NUMA nodes host

 (1/2) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.0.x86_64.io-github-autotest-qemu.hugepage_reset.q35: STARTED
 (1/2) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.0.x86_64.io-github-autotest-qemu.hugepage_reset.q35: PASS (164.30 s)
 (2/2) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.0.x86_64.io-github-autotest-qemu.hugepage_reset.on_numa_node.q35: STARTED
 (2/2) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.0.x86_64.io-github-autotest-qemu.hugepage_reset.on_numa_node.q35: PASS (172.17 s)
RESULTS    : PASS 2 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

@JinLiul
Copy link
Contributor

JinLiul commented Jan 16, 2025

huge_page_1G test loop passed
RESULTS : PASS 15 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML : /root/avocado/job-results/job-2025-01-16T02.43-e465a66/results.html
JOB TIME : 4674.58 s

@mcasquer mcasquer force-pushed the 3254_hp_reset_setup branch from b1013a9 to d77d567 Compare January 16, 2025 09:36
@mcasquer
Copy link
Contributor Author

@PaulYuuu @luckyh please could you review this PR? Thanks !

@@ -107,9 +125,9 @@ def heavyload_install():
"No node on your host has sufficient free memory for " "this test."
)
hp_config = test_setup.HugePageConfig(params)
if params.get("on_numa_node"):
allocate_largepages_per_node()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcasquer , code LGTM, I just want to confirm with you that if the node mem is not enough, still setup or better to raise error or skip the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm @PaulYuuu good point, I think that situation should be handled, perhaps with a try block, I'll send an update of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PaulYuuu added a try block that will cancel the case ig there's no enough memory, faked example:

 (2/2) Host_RHEL.m9.u6.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.6.0.x86_64.io-github-autotest-qemu.hugepage_reset.on_numa_node.q35: STARTED
 (2/2) Host_RHEL.m9.u6.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.6.0.x86_64.io-github-autotest-qemu.hugepage_reset.on_numa_node.q35: CANCEL: 18 (expecting 400) hugepages is set on the node 0, please check if the node has enough memory (12.51 s)

@mcasquer mcasquer force-pushed the 3254_hp_reset_setup branch from d77d567 to 3b3e2de Compare January 19, 2025 21:17
Comment on lines 39 to 40
mem_kb = mem * 1024
if node_mem_free > mem_kb:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mem_kb = mem * 1024
if node_mem_free > mem_kb:
if node_mem_free > (mem * 1024):

except ValueError as e:
test.cancel(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure when we will meet ValueError, read_from_node_meminfo?
When all nodes do not match the enough memory we want, the current loop will continue the test, but we should skip the test as well.

As the test will set 8 hugepages, this works fine
for systems with 2 NUMA nodes, having e.g. 8 nodes
is going to lead the on_numa_node variant to fail
since the binded node doesn't have enough hugepages.

As the cfg already suggests to allocate 1G hugepages
on boot time, let's make user decision how many hugepages
allocate, adding an informative comment in the cfg as well.

Finally, if system hugepage_size is 1GB, allocates at
runtime enough hugepages in all valid nodes.

Signed-off-by: mcasquer <[email protected]>
@mcasquer mcasquer force-pushed the 3254_hp_reset_setup branch from 3b3e2de to b870b91 Compare January 24, 2025 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants