Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf/perf_24x7_hardware_counters.py: check_valid_chip - Update testcase to avoid failures #2787

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

misanjumn
Copy link
Contributor

perf/perf_24x7_hardware_counters.py: check_valid_chip - Update testcase to avoid failures

Command: perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=value
Maximum chip value which can be assigned is 65535
Anything above that range should fail and this test is used to verify that

Hence, setting invalid_chip=65536 can directly test the command In scenarious where multiple chip values are tested, this testcase can fail if the chip values passed before 65536 are in the accepted range.

Signed-off-by: Misbah Anjum N [email protected]

Command: perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=<value>

Maximum chip value which can be assigned is 65535
Anything above that range should fail and this test is used
to verify that

Hence, setting invalid_chip=65536 can directly test the command
In scenarious where multiple chip values are tested, this testcase
can fail if the chip values passed before 65536 are in the accepted
range.

Signed-off-by: Misbah Anjum N <[email protected]>
@misanjumn
Copy link
Contributor Author

misanjumn commented Mar 29, 2024

Maximum Chip value which can be taken is 65535

# perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=65536/ /bin/true
Using CPUID 0x00800200
Multiple errors dropping message: value too big for format, maximum is 65535 (<no help>)
event syntax error: '..0_CYC,domain=1,chip=65536/'
                                  \___ Bad event or PMU

Unable to find PMU or event on a PMU of 'hv_24x7'

Initial error:
event syntax error: '..0_CYC,domain=1,chip=65536/'
                                  \___ value too big for format, maximum is 65535
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events

@misanjumn
Copy link
Contributor Author

OS: Fedora40

Before the patch: invalid_chip = [self.chips, 65536]

STATUS: FAIL

 (1/1) /home/misanjumn/tests/tests/avocado-misc-tests/perf/perf_24x7_hardware_counters.py:EliminateDomainSuffix.test_check_invalid_chip: STARTED
 (1/1) /home/misanjumn/tests/tests/avocado-misc-tests/perf/perf_24x7_hardware_counters.py:EliminateDomainSuffix.test_check_invalid_chip:  FAIL: perf unable to recognise invalid chip value (1.39 s)
RESULTS    : PASS 0 | ERROR 0 | FAIL 1 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Exaplanation:-

at line: invalid_chip = [self.chips, 65536]
self.chips value was taken as 2
invalid_chip = [2, 65536]

Command: perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=2/ /bin/true
Result: exit status: 0 (PASS)
Test-Case Result: FAIL
Reason: Before testing the invalid_chip = 65536, valid chip value = 2 was tested. This command passed and the code logic failed.

@misanjumn
Copy link
Contributor Author

misanjumn commented Mar 29, 2024

OS: Fedora40

After the patch: invalid_chip = 65536

STATUS: PASS

 (1/1) /home/misanjumn/tests/tests/avocado-misc-tests/perf/perf_24x7_hardware_counters.py:EliminateDomainSuffix.test_check_invalid_chip: STARTED
 (1/1) /home/misanjumn/tests/tests/avocado-misc-tests/perf/perf_24x7_hardware_counters.py:EliminateDomainSuffix.test_check_invalid_chip:  PASS (1.39 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Exaplanation:-

at line: invalid_chip = 65536
Directly invalid value is being taken

Command: perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=65536/ /bin/true
Result: exit status: 129 (FAIL)
Test-Case Result: PASS
Reason: There is no other value being tested except for the invalid value. Hence the test-case will pass

@misanjumn
Copy link
Contributor Author

Tested in OS RHEL9.3 with the patch

STATUS: PASS

 (1/1) /home/misanjumn/tests/tests/avocado-misc-tests/perf/perf_24x7_hardware_counters.py:EliminateDomainSuffix.test_check_invalid_chip: STARTED
 (1/1) /home/misanjumn/tests/tests/avocado-misc-tests/perf/perf_24x7_hardware_counters.py:EliminateDomainSuffix.test_check_invalid_chip:  PASS (1.53 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

@disgoel
Copy link
Contributor

disgoel commented Apr 1, 2024

OS: Fedora40

Before the patch: invalid_chip = [self.chips, 65536]

STATUS: FAIL

Exaplanation:-

at line: invalid_chip = [self.chips, 65536] self.chips value was taken as 2 invalid_chip = [2, 65536]

Command: perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=2/ /bin/true Result: exit status: 0 (PASS) Test-Case Result: FAIL Reason: Before testing the invalid_chip = 65536, valid chip value = 2 was tested. This command passed and the code logic failed.

The supported chip id value is taken as below from lscpu output.
chip = physical_sockets * physical_chips

# lscpu
  Physical sockets:      2
  Physical chips:        2

So, lets take chip = 2*2 = 4 which means 0,1,2,3 are valid and self.chip which is 4 becomes invalid as value is starting from 0. So the test case logic is correct for invalid chip which is testing self.chips and 65536. You need to check debug log to see why exactly test case is failing.

valid chip value = 0 to self.chips-1
invalid chip value = self.chips and >65535

@disgoel
Copy link
Contributor

disgoel commented Apr 1, 2024

result on a lpar where we have chip value as 4.

# perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=3/ /bin/true
Using CPUID 0x00800200
Control descriptor is not initialized
hv_24x7/PM_PHB0_0_CYC,domain=1,chip=3/: 0 595374 595374

 Performance counter stats for 'system wide':

                 0      hv_24x7/PM_PHB0_0_CYC,domain=1,chip=3/

       0.000589873 seconds time elapsed

# perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=4/ /bin/true
Using CPUID 0x00800200
Control descriptor is not initialized
Error:
The sys_perf_event_open() syscall returned with 5 (Input/output error) for event (hv_24x7/PM_PHB0_0_CYC,domain=1,chip=4/).
/bin/dmesg | grep -i perf may provide additional information.

# perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=65536/ /bin/true
Using CPUID 0x00800200
Multiple errors dropping message: value too big for format, maximum is 65535 (<no help>)
event syntax error: '..0_CYC,domain=1,chip=65536/'
                                  \___ Bad event or PMU

Unable to find PMU or event on a PMU of 'hv_24x7'

Initial error:
event syntax error: '..0_CYC,domain=1,chip=65536/'
                                  \___ value too big for format, maximum is 65535
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events

@sacsant
Copy link
Contributor

sacsant commented Apr 11, 2024

@misanjumn can you attach relevant debug logs captured during a failed run so that they can be reviewed to debug this problem?

@abdhaleegit abdhaleegit self-requested a review April 19, 2024 09:45
res = self.event_stat1(cmd)
if res.exit_status == 0:
self.fail("perf unable to recognise invalid chip value")
invalid_chip = 65536
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you have a meaning full name, max_range ?

if res.exit_status == 0:
self.fail("perf unable to recognise invalid chip value")
invalid_chip = 65536
if self.rev == '004b' or self.rev == '004e':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.rev in ['004b' , '004e']:

@abdhaleegit
Copy link
Collaborator

@misanjumn any update here ?

@abdhaleegit abdhaleegit self-assigned this May 6, 2024
@misanjumn
Copy link
Contributor Author

@disgoel

With code
invalid_chip = [self.chips, 65536]

self.chips = 2

# lscpu
  Physical sockets:       2
  Physical chips:         1

But the exit status for Command perf stat with the self.chips = 2 is 0:

perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=2/ /bin/true
Result: exit status: 0 (PASS)

Corresponding debug.log for invalid_chip = [self.chips, 65536]

[stdlog] 2024-05-06 02:47:44,188 avocado.test test             L0616 DEBUG|  -> res <class 'avocado.utils.process.CmdResult'>: command: 'perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=2/ /bin/true'
[stdlog] exit_status: 0
[stdlog] duration: 0.009930704021826386
[stdlog] interrupted: False
[stdlog] pid: 819968
[stdlog] encoding: 'UTF-8'
[stdlog] stdout: b''
[stdlog] stderr: b"Using CPUID 0x00800200\nControl descriptor is not initialized\nhv_24x7/PM_PHB0_0_CYC,domain=1,chip=2/: 0 390070 390070\n\n Performance counter stats for 'system wide':\n\n                 0      hv_24x7/PM_PHB0_0_CYC,domain=1,chip=2/                                      \n\n       0.000387538 seconds time elapsed\n\n"
[stdlog] 2024-05-06 02:47:44,189 avocado.test test             L0711 ERROR| FAIL 1-/home/misanjumn/tests/tests/avocado-misc-tests/perf/perf_24x7_hardware_counters.py:EliminateDomainSuffix.test_check_invalid_chip -> TestFail: perf unable to recognise invalid chip value

Since the exit status is 0, the code logic takes the command execution as pass and hence gives FAIL as perf unable to recognise invalid chip value

@misanjumn
Copy link
Contributor Author

misanjumn commented May 6, 2024

@disgoel

In the test machine:-

# perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=2/ /bin/true
Using CPUID 0x00800200
Control descriptor is not initialized
hv_24x7/PM_PHB0_0_CYC,domain=1,chip=2/: 0 505004 505004

 Performance counter stats for 'system wide':

                 0      hv_24x7/PM_PHB0_0_CYC,domain=1,chip=2/                                      

       0.000496042 seconds time elapsed
# perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=3/ /bin/true
Using CPUID 0x00800200
Control descriptor is not initialized
hv_24x7/PM_PHB0_0_CYC,domain=1,chip=3/: 0 519842 519842

 Performance counter stats for 'system wide':

                 0      hv_24x7/PM_PHB0_0_CYC,domain=1,chip=3/                                      

       0.000513708 seconds time elapsed
# perf stat -v -e hv_24x7/PM_PHB0_0_CYC,domain=1,chip=4/ /bin/true
Using CPUID 0x00800200
Control descriptor is not initialized
Error:
The sys_perf_event_open() syscall returned with 5 (Input/output error) for event (hv_24x7/PM_PHB0_0_CYC,domain=1,chip=4/).
/bin/dmesg | grep -i perf may provide additional information.

But chip value = 2 is being tested in the test-case since:-

# lscpu
  Physical sockets:       2
  Physical chips:         1

@abdhaleegit abdhaleegit self-requested a review June 26, 2024 07:44
@sacsant
Copy link
Contributor

sacsant commented Sep 3, 2024

@disgoel @misanjumn so where are we on this? Is this(perf stat with self.chip as chip value returns success) an actual bug or a test case issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants