Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qemu_vm.py: update vhost thread matching pattern and process command #3971

Merged
merged 1 commit into from
Sep 29, 2024

Conversation

quanwenli
Copy link
Contributor

@quanwenli quanwenli commented Aug 13, 2024

- switched to using "ps -eL" for vhost thread detection, as vhost
threads are now created as separate threads and no longer appear in
"ps aux" output in the latest kernel. "ps -eL" also works with older
kernels.
- fix vhost_thread_pattern to capture LWP from "ps -eL" output

Signed-off-by: Wenli Quan [email protected]

id: 2774

@quanwenli
Copy link
Contributor Author

JOB LOG : /root/avocado/job-results/job-2024-08-13T05.36-c371659/job.log
(1/1) BF3.bridge_test.1q.acceptance.repeat1.Host_RHEL.m10.u0.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.netperf.default.host_guest.q35: STARTED
(1/1) BF3.bridge_test.1q.acceptance.repeat1.Host_RHEL.m10.u0.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.netperf.default.host_guest.q35: PASS (205.84 s)
RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML : /root/avocado/job-results/job-2024-08-13T05.36-c371659/results.html

@quanwenli
Copy link
Contributor Author

@YongxueHong could you help review it. thanks a lot.

@YongxueHong
Copy link
Contributor

Hi, @leidwang @fbq815 @zhenyzha @yanglei-rh @PaulYuuu
Could you help to review it? Thanks a lot.

@@ -3945,7 +3945,7 @@ def create(
", ".join([str(tid) for tid in self.vcpu_threads]),
)
vhost_thread_pattern = params.get(
"vhost_thread_pattern", r"\w+\s+(\d+)\s.*\[vhost-%s\]"
"vhost_thread_pattern", r"\w+\s+(\d+)\s.*\[*vhost-%s\]*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @quanwenli
Could you provide the example output of both situations?
Thanks.
CC @leidwang @PaulYuuu , it maybe affect your test case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @YongxueHong I guess you'd like to ping me istead of @leidwang. And the code looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on old system:
ps aux | grep host
root 331500 24.8 0.0 0 0 ? R 22:29 0:18 [vhost-331496]
ps -eL | grep host
331500 331500 ? 00:00:27 vhost-331496

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matched group is not the same one. The old pattern matches "PID" from ps aux, but the new pattern matches "LWP" from ps -eL, In your sanity test, they are the same, but cannot mark sure they are always the same.

Header of ps -eL and ps aux

# ps -eL
    PID     LWP TTY          TIME CMD
      1       1 pts/0    00:00:00 bash
     37      37 pts/0    00:00:00 ps
# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   4832  3656 pts/0    Ss   02:44   0:00 /bin/bash
root          38  0.0  0.0   7708  3704 pts/0    R+   02:48   0:00 ps aux

I suggest changing the pattern accordingly.

Copy link
Contributor

@YongxueHong YongxueHong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@YongxueHong
Copy link
Contributor

Hi, @leidwang @fbq815 @zhenyzha @PaulYuuu
Just for a kindly reminding. Could you help to review it from your perspective?
Thanks

Copy link
Contributor

@fbq815 fbq815 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
RESULTS : PASS 6 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

@zhenyzha
Copy link
Contributor

multi_queues_test.invalid_queues_number.upper_border.arm64-pci: PASS (36.69 s)

LGTM.Acked-by: Zhenyu Zhang [email protected]

@@ -4396,7 +4396,7 @@ def get_vhost_threads(self, vhost_thread_pattern):
int(_)
for _ in re.findall(
vhost_thread_pattern % self.get_pid(),
process.run("ps aux", verbose=False).stdout_text,
process.run("ps -eL", verbose=False).stdout_text,
Copy link
Contributor

@PaulYuuu PaulYuuu Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help info from ps:

# ps --help threads

Usage:
 ps [options]

Show threads:
  H                   as if they were processes
 -L                   possibly with LWP and NLWP columns
 -m, m                after processes
 -T                   possibly with SPID column

I think ps -e is enough in this place. the -L option has different behavior in different OS. With this change. We should also test multiqueues cases to make sure no regression.

Copy link
Contributor Author

@quanwenli quanwenli Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just confirmed with the developer that we need to get the LWP values for the vhost threads. Additionally, I checked on the latest system and found that there are no vhost thread ids returned by the “ps -e” command. Therefore, “ps -eL” is needed in the code and I also update the pattern. please help review.

on rhel9.5:
ps -eL |grep vhost
2922 2922 ? 00:00:00 vhost-2918
2923 2923 ? 00:00:00 vhost-2918
2924 2924 ? 00:00:00 vhost-2918
2925 2925 ? 00:00:00 vhost-2918
ps -e |grep vhost
2922 ? 00:00:00 vhost-2918
2923 ? 00:00:00 vhost-2918
2924 ? 00:00:00 vhost-2918
2925 ? 00:00:00 vhost-2918

on latest system:
ps -e |grep vhost
ps -eL |grep vhost
186773 186784 pts/1 00:00:00 vhost-186773
186773 186785 pts/1 00:00:00 vhost-186773
186773 186786 pts/1 00:00:00 vhost-186773
186773 186788 pts/1 00:00:00 vhost-186773

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this means the kernel behavior changes from 1 vhost per process to 1 vhost per thread. However, I see that you modified the pattern to match LWP instead of PID, which is different from the original pattern, can you explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From ps man page: LWP means light weight process (thread) ID of the dispatchable entity (alias spid, tid).

As I mentioned before, I confirmed with developer, we need to check the vhost thread ID from LWP. and also you can see the output on latest system with a vm with 4 queues, the pid is the only one but four LWP are the different, so, the pattern should be adjusted to correctly match the LWP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some means you want TID, not PID now. This should be a change, we need to update the doc string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the both doc string and git commit. please help review.

- switched to using "ps -eL" for vhost thread detection, as vhost
threads are now created as separate threads and no longer appear in
"ps aux" output in the latest kernel. "ps -eL" also works with older
kernels.
- fix vhost_thread_pattern to capture LWP from "ps -eL" output

Signed-off-by: Wenli Quan <[email protected]>
@quanwenli
Copy link
Contributor Author

JOB LOG : /root/avocado/job-results/job-2024-09-25T03.36-43de120/job.log
(1/1) 82599.bridge_test.1q.acceptance.repeat1.Host_RHEL.m9.u5.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.netperf.default.host_guest.q35: STARTED
(1/1) 82599.bridge_test.1q.acceptance.repeat1.Host_RHEL.m9.u5.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.9.5.0.x86_64.io-github-autotest-qemu.netperf.default.host_guest.q35: PASS (679.59 s)
RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML : /root/avocado/job-results/job-2024-09-25T03.36-43de120/results.html
JOB TIME : 681.18 s

Copy link
Contributor

@yanglei-rh yanglei-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@YongxueHong
Copy link
Contributor

The only thing we need to take care of is updating the regex with the accurate pattern for multi_queues_test.py test case since we modified the implementation here.
Although the pattern does work now, there is still a potential issue here, I suggest updating it later.
CC @yanglei-rh @PaulYuuu @zhenyzha @fbq815
Thanks.

@YongxueHong YongxueHong merged commit 572bdac into avocado-framework:master Sep 29, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants