rh_kselftests_vm: kernel selftests execution in guest #4114

mcasquer · 2024-07-23T07:39:52Z

rh_kselftests_vm: kernel selftests execution in guest

Creates a new test case that executes the kernel selftests
inside the VM through the RPM that has been previously downloaded
and installed. Could be expanded with more tests in the future.

Signed-off-by: mcasquer [email protected]
ID: 2637

mcasquer · 2024-07-23T07:56:54Z

 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.vm_hugetlb_selftests.q35: STARTED
 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.vm_hugetlb_selftests.q35: PASS (171.86 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

mcasquer · 2024-07-23T07:58:35Z

@zhenyzha @MiriamDeng @fbq815 @zhencliu @YongxueHong please could you review this PR? Thanks !

mcasquer · 2024-08-07T07:22:38Z

@zhenyzha @MiriamDeng @fbq815 @zhencliu @YongxueHong this is a kindly reminder, please could you review this PR? Thanks !

qemu/tests/vm_hugetlb_selftests.py

qemu/tests/cfg/vm_hugetlb_selftests.cfg

zhenyzha · 2024-08-08T08:21:35Z

@mcasquer Could you provide the test results of rhel.10? I have not been able to compile successfully on 10 recently.
qemu.vm_hugetlb_selftests.arm64-pci: FAIL: Error during mm selftests compilation: <sys/capability.h>

mcasquer · 2024-08-08T11:47:04Z

@mcasquer Could you provide the test results of rhel.10? I have not been able to compile successfully on 10 recently. qemu.vm_hugetlb_selftests.arm64-pci: FAIL: Error during mm selftests compilation: <sys/capability.h>

Indeed, good catch !

yanan-fu · 2024-08-22T06:00:56Z

Hi @mcasquer , I have a quick look with the kernel CKI testing for the mm part, any code change with the following trigger source will trigger the test automatically,

trigger_sources:
  - tools/testings/selftests/.*
  - mm/.*
  - arch/x86/mm/.*
  - arch/s390/mm/.*
  - arch/arm64/mm/.*
  - arch/powerpc/mm.*

hugetlb was covered with:
https://gitlab.com/redhat/centos-stream/tests/kernel/kpet-db/-/blob/main/cases/selftests/kselftests/mm.yaml?ref_type=heads#L64

I would like to double confirm does this meet your requirements, or it is must to check it in a qemu-kvm based VM os which is the purpose of this patch ?
Thanks

mcasquer · 2024-08-26T10:39:09Z

I would like to double confirm does this meet your requirements, or it is must to check it in a qemu-kvm based VM os which is the purpose of this patch ? Thanks

@yanan-fu the idea is to execute those tests inside a VM backed by hugepages, my understanding is the kernel team is not covering this scenario so that's why we need the test case

zhencliu · 2024-08-26T10:46:11Z

I would like to double confirm does this meet your requirements, or it is must to check it in a qemu-kvm based VM os which is the purpose of this patch ? Thanks

@yanan-fu the idea is to execute those tests inside a VM backed by hugepages, my understanding is the kernel team is not covering this scenario so that's why we need the test case

Hi @mcasquer , I am curious do we need to cover different page size? Or the page size is a part of test matrix?

mcasquer · 2024-08-26T10:49:01Z

I would like to double confirm does this meet your requirements, or it is must to check it in a qemu-kvm based VM os which is the purpose of this patch ? Thanks

@yanan-fu the idea is to execute those tests inside a VM backed by hugepages, my understanding is the kernel team is not covering this scenario so that's why we need the test case

Hi @mcasquer , I am curious do we need to cover different page size? Or the page size is a part of test matrix?

@zhencliu not really, at least for x86_64 the 2MB hugepages are fine

mcasquer · 2024-09-02T14:22:02Z

@YongxueHong @zhencliu could you review again this PR? Thanks!

mcasquer · 2024-09-03T07:25:55Z

@zhenyzha @fbq815 do you consider this test case can be supported in you corresponding archs? Please have a look to the issue and to the internal patch as well

mcasquer · 2024-09-03T07:26:21Z

@zhenyzha @fbq815 do you consider this test case can be supported in you corresponding archs? Please have a look to the issue and to the internal patch as well

That way I will delete the only x86_64 key from cfg

qemu/tests/vm_hugetlb_selftests.py

mcasquer · 2024-09-24T08:14:03Z

@yanan-fu @PaulYuuu could you review this PR? Thanks !

fbq815

As the test result above, LGTM

mcasquer · 2024-09-25T05:39:13Z

@PaulYuuu @yanan-fu sorry for pinging you again, but it would be ideal to merge this PR before the end of the month, thanks !

PaulYuuu · 2024-09-25T03:18:15Z

qemu/tests/rh_kselftests_vm.py

+        test.fail("Error during selftests execution: %s" % o)
+
+    test.log.info("The selftests results: %s" % o)
+    error_context.context("Cleaning kernel files", test.log.debug)


This would be in the finally section of the try block, which means even if the test case fails or not, we should clean the downloaded rpm(and also uninstall it?).

Or use clone_master = yes to skip the cleanup step.

@PaulYuuu done !

yanan-fu · 2024-09-25T05:51:42Z

Following @yanan-fu suggestion I took qemu_guest_agent as a reference please take a look and let me know if this is the idea you have in mind FYI @zhencliu

Thanks Mario, I am afraid it's not exactly. If we follow qemu_guest_agent, we introduce the class abstraction to encapsulate everything, i.e. setup/cleanup kselftest rpm package, and running the specific test with a member function execute() simply. Currently, the class you introduced just does part of pre-test task: setting the cartesian params in py, actually we don't recommend changing the global params in running time.
My previous idea is a more simple way, e.g. we define a selftests_function = 'test_vm_kernel_mm' for the 'mm' variant, then we can define a 'test_vm_kernel_mm' function in py, so function_obj is the locals()[params["selftests_function"]], in future we add a new variant 'nn', selftests_function = 'test_vm_kernel_nn', we define a 'test_vm_kernel_nn' function to run the testing.
But as we talked previously, the tests_execution_cmd is a more simple way, if no pre-task needs to be performed, we are OK for your previous solution. So what about using your last commit due to the deadline? You could consider the qemu_guest_agent solution when you have time

@zhencliu ok I see, yeah better going back to the tests_execution_cmd approach at this moment, thanks !

I just saw the version now.
It is not qga style!!! May be i did not make it clear enough, the key point is the gagent_check_type which lead to a individual function for one test case (variant), the solution is what @zhencliu mentioned above.

Is there any blocker to give up it now ? Just some code structure change.

yanan-fu · 2024-09-25T05:59:11Z

qemu/tests/rh_kselftests_vm.py

+    s, o = session.cmd_status_output(tests_execution_cmd, 180)
+
+    # Exit code for skipped selftests is 4, raise a warning until is fixed
+    if s == 4:


I do not think it is a issue need be fixed in kernel selftests, but it is intentional.

From my checking with the source code, if there is a skip case after a fail case, the return value is 4. It is wrong(Not the scope of this PR but the kselftest).

Suggest to use whitelist for the know test can be skipped, and parse the output of the test to make a final decision of the test case result.

@yfu but then how will the user know that there are skipped tests? If we set the skipped as passed it's possible the user won't be aware, right?

We are talking about exitcode in this thread.

run_test() { if test_selected ${CATEGORY}; then # On memory constrainted systems some tests can fail to allocate hugepages. # perform some cleanup before the test for a higher success rate. if [ ${CATEGORY} == "thp" ] | [ ${CATEGORY} == "hugetlb" ]; then echo 3 > /proc/sys/vm/drop_caches sleep 2 echo 1 > /proc/sys/vm/compact_memory sleep 2 fi local test=$(pretty_name "$*") local title="running $*" local sep=$(echo -n "$title" | tr "[:graph:][:space:]" -) printf "%s\n%s\n%s\n" "$sep" "$title" "$sep" | tap_prefix ("$@" 2>&1) | tap_prefix local ret=${PIPESTATUS[0]} count_total=$(( count_total + 1 )) if [ $ret -eq 0 ]; then count_pass=$(( count_pass + 1 )) echo "[PASS]" | tap_prefix echo "ok ${count_total} ${test}" | tap_output elif [ $ret -eq $ksft_skip ]; then count_skip=$(( count_skip + 1 )) echo "[SKIP]" | tap_prefix echo "ok ${count_total} ${test} # SKIP" | tap_output exitcode=$ksft_skip else count_fail=$(( count_fail + 1 )) echo "[FAIL]" | tap_prefix echo "not ok ${count_total} ${test} # exit=$ret" | tap_output exitcode=1 fi fi # test_selected }

In the loop,

1: PASS 1: SKIP 1: FAIL

2: PASS 0 4 1

2: SKIP 4 4 4

2: FAIL 1 1 1

Scenarios that we need to agree on are SKIP --> FAIL(exit 1) and FAIL --> SKIP(exit 4). I think FAIL --> SKIP(exit 4) must return 1 but this needs to be changed at Linux source code, for now, I don't think we have good solution to handle it. A workaround is to check the output and collect count of [PASS] [SKIP] [FAIL] rather than check the exit code. Keep if s == 4: in this version is in order to safely raise a warning if we will have new test cases for mm in the future.

@yfu but then how will the user know that there are skipped tests? If we set the skipped as passed it's possible the user won't be aware, right?

Any skip but case not in whitelist should fail this auto case. Checking the output is needed.

Use exitcode is incorrect if there are skip case but after a fail one

If the skip case is in whitelist, i prefer to mark test case status to PASS instead of WARN as it is a known issue.

Added the whitelist approach

zhencliu · 2024-09-25T09:56:37Z

Just some code structure change.

hi @yanan-fu , time is limited, as we talked before, tests_execution_cmd is OK for the current testing, and we may have to enhance it when we need to do some pre-test work or the execution is different from mm test, so what about keeping the current code, and make the change once we have to

yanan-fu · 2024-09-25T10:51:16Z

Just some code structure change.

hi @yanan-fu , time is limited, as we talked before, tests_execution_cmd is OK for the current testing, and we may have to enhance it when we need to do some pre-test work or the execution is different from mm test, so what about keeping the current code, and make the change once we have to

okay, then let's focus on the result parser now, cc @mcasquer

mcasquer · 2024-09-25T19:22:08Z

Results in x86_64

 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: STARTED
 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: PASS (213.96 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Results in s390x (now marked as passed)

 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: STARTED
 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: PASS (101.15 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

mcasquer · 2024-09-25T19:22:26Z

@yanan-fu @zhencliu @PaulYuuu could you review again this PR? Thanks !

yanan-fu · 2024-09-26T02:51:56Z

qemu/tests/cfg/rh_kselftests_vm.cfg

+                kvm_module_parameters = 'hpage=1'
+            setup_hugepages = yes
+            tests_execution_cmd = "cd ${kselftests_path}/mm && sh run_vmtests.sh -t hugetlb"
+            whitelist = "hugetlb_fault_after_madv"


Move into s390x: as skip this case for s390x only if i remember it correctly.

yanan-fu · 2024-09-26T02:54:12Z

qemu/tests/rh_kselftests_vm.py

+    session = vm.wait_for_login()
+    kernel_path = params.get("kernel_path", "/tmp/kernel")
+    tests_execution_cmd = params.get("tests_execution_cmd")
+    whitelist = params.get("whitelist").split()


Align with the comment above, now, only s390x have the known skip case, so:
whitelist = params.get("whitelist", "").split() or whitelist = params.objects("whitelist")

yanan-fu · 2024-09-26T02:55:45Z

qemu/tests/rh_kselftests_vm.py

+        skipped_tests = True if "[SKIP]" in o else False
+        test.log.debug("Skipped tests: %r" % skipped_tests)
+        for test_name in whitelist:
+            if skipped_tests and test_name in o:


test_name is always in the o, no matter the status is pass, skip or fail.
You need to parser the output and get the skiped case name.

mcasquer · 2024-09-26T07:39:55Z

Results in x86_64

 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: STARTED
 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: PASS (228.19 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Results in s390x

 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: STARTED
 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: PASS (102.73 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

PaulYuuu · 2024-09-27T02:46:35Z

qemu/tests/rh_kselftests_vm.py

+        if len(skipped_list) == num_skipped_tests:
+            return True
+        elif len(skipped_list) < num_skipped_tests:
+            raise exceptions.TestWarn("Some skipped test(s) are not in the whitelist")
+        if s != 0:
+            test.fail("Error during selftests execution: %s" % o)


Almost LGTM, except here.

If we have 3 test cases, 1 PASS 1 SKIP 1 FAIL, the skip one is on the whitelist, the current logic still returns True, but PASS + SKIP != 3.
I think the summary like SUMMARY: PASS=9 SKIP=1 FAIL=0 can help to detect how to handle it.

@PaulYuuu
Updated, failed cases are considered first, then skipped so nothing should be missed now

PaulYuuu

@yanan-fu I am okay with this version, how about you?

PaulYuuu · 2024-09-27T08:32:29Z

qemu/tests/rh_kselftests_vm.py

+        test.log.info("The selftests results: %s" % o)
+
+        summary = re.findall(r"\# SUMMARY.+", o)
+        num_failed_tests = int(re.findall(r"FAIL\=\d", str(summary))[0].split('=')[1])


Suggested change

num_failed_tests = int(re.findall(r"FAIL\=\d", str(summary))[0].split('=')[1])

num_failed_tests = int(re.findall(r"FAIL\=\d+", str(summary))[0].split('=')[1])

To assume count may more than 10, same for skipped test cases.

PaulYuuu · 2024-09-27T08:36:09Z

qemu/tests/rh_kselftests_vm.py

+        test.log.debug("Number of failed tests: %d" % num_failed_tests)
+
+        if num_failed_tests != 0:
+            test.fail("Error during selftests execution: %s" % o)


You already log output above, let's simplify the error message, the current one is too long for the test status.

PaulYuuu · 2024-09-27T08:39:24Z

qemu/tests/rh_kselftests_vm.py

+
+        summary = re.findall(r"\# SUMMARY.+", o)
+        num_failed_tests = int(re.findall(r"FAIL\=\d", str(summary))[0].split('=')[1])
+        test.log.debug("Number of failed tests: %d" % num_failed_tests)


Suggested change

test.log.debug("Number of failed tests: %d" % num_failed_tests)

test.log.debug("Number of failed tests: %d", num_failed_tests)

A bit suggestion, logging module will help to format it with this during call, can refer to the logging doc.

Creates a new test case that executes the kernel selftests inside the VM through the RPM that has been previously downloaded and installed. Could be expanded with more tests in the future. Signed-off-by: mcasquer <[email protected]>

mcasquer · 2024-09-27T10:10:19Z

Results in x86_64

 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: STARTED
 (1/1) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.rh_kselftests_vm.mm.q35: PASS (198.72 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Results in s390x

 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: STARTED
 (1/1) Host_RHEL.m9.u5.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.5.0.s390x.io-github-autotest-qemu.rh_kselftests_vm.mm.s390-virtio: PASS (102.31 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch from f024df5 to fb7b441 Compare July 23, 2024 07:41

mcasquer marked this pull request as ready for review July 23, 2024 07:58

zhencliu reviewed Aug 8, 2024

View reviewed changes

qemu/tests/vm_hugetlb_selftests.py Outdated Show resolved Hide resolved

zhencliu reviewed Aug 8, 2024

View reviewed changes

qemu/tests/cfg/vm_hugetlb_selftests.cfg Outdated Show resolved Hide resolved

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch 3 times, most recently from 4aeb981 to f76de3b Compare August 13, 2024 07:09

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch from f76de3b to 370b56e Compare August 16, 2024 06:35

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch from 370b56e to 061d464 Compare August 26, 2024 10:39

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch 3 times, most recently from edc0751 to 68c1f57 Compare September 2, 2024 13:56

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch 4 times, most recently from 9d7799f to 083e5ca Compare September 3, 2024 07:22

zhenyzha reviewed Sep 3, 2024

View reviewed changes

qemu/tests/vm_hugetlb_selftests.py Outdated Show resolved Hide resolved

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch from 083e5ca to 20a4555 Compare September 12, 2024 06:00

fbq815 approved these changes Sep 24, 2024

View reviewed changes

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch 2 times, most recently from ac5e452 to 7906012 Compare September 25, 2024 05:34

PaulYuuu reviewed Sep 25, 2024

View reviewed changes

yanan-fu reviewed Sep 25, 2024

View reviewed changes

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch from 7906012 to c399ddc Compare September 25, 2024 19:19

yanan-fu reviewed Sep 26, 2024

View reviewed changes

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch 3 times, most recently from cc64e54 to b851155 Compare September 26, 2024 07:38

PaulYuuu reviewed Sep 27, 2024

View reviewed changes

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch from b851155 to ae60572 Compare September 27, 2024 07:56

PaulYuuu reviewed Sep 27, 2024

View reviewed changes

rh_kselftests_vm: kernel selftests execution in guest

981ba5d

Creates a new test case that executes the kernel selftests inside the VM through the RPM that has been previously downloaded and installed. Could be expanded with more tests in the future. Signed-off-by: mcasquer <[email protected]>

mcasquer force-pushed the 2637_hugetlb_kernel_selftests branch from ae60572 to 981ba5d Compare September 27, 2024 08:51

PaulYuuu approved these changes Sep 27, 2024

View reviewed changes

zhencliu approved these changes Sep 27, 2024

View reviewed changes

yanan-fu approved these changes Sep 29, 2024

View reviewed changes

YongxueHong merged commit 2b6aff3 into autotest:master Sep 29, 2024
7 checks passed

	num_failed_tests = int(re.findall(r"FAIL\=\d", str(summary))[0].split('=')[1])
	num_failed_tests = int(re.findall(r"FAIL\=\d+", str(summary))[0].split('=')[1])

	test.log.debug("Number of failed tests: %d" % num_failed_tests)
	test.log.debug("Number of failed tests: %d", num_failed_tests)

rh_kselftests_vm: kernel selftests execution in guest #4114

rh_kselftests_vm: kernel selftests execution in guest #4114

Conversation

mcasquer commented Jul 23, 2024 • edited Loading

mcasquer commented Jul 23, 2024

mcasquer commented Jul 23, 2024

mcasquer commented Aug 7, 2024 • edited Loading

zhenyzha commented Aug 8, 2024

mcasquer commented Aug 8, 2024

yanan-fu commented Aug 22, 2024

mcasquer commented Aug 26, 2024

zhencliu commented Aug 26, 2024

mcasquer commented Aug 26, 2024

mcasquer commented Sep 2, 2024

mcasquer commented Sep 3, 2024

mcasquer commented Sep 3, 2024

mcasquer commented Sep 24, 2024

fbq815 left a comment

Choose a reason for hiding this comment

mcasquer commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanan-fu commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PaulYuuu Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhencliu commented Sep 25, 2024

yanan-fu commented Sep 25, 2024

mcasquer commented Sep 25, 2024

mcasquer commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcasquer commented Sep 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PaulYuuu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcasquer commented Sep 27, 2024

mcasquer commented Jul 23, 2024 •

edited

Loading

mcasquer commented Aug 7, 2024 •

edited

Loading

PaulYuuu Sep 25, 2024 •

edited

Loading