Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for HTX hung issue while setting Host and Peer setup. #2875

Merged

Conversation

FarooqAbdulla02
Copy link
Contributor

@FarooqAbdulla02 FarooqAbdulla02 commented Aug 28, 2024

This code has fix for HTX hung issue that causes by killing the HTXD deamon during the setup, due to this HTX setup was existing with "cannot connect to peer" Error.

actual Error on HTX Host and Peer setup:
++++++++++++++++++++++++++++++++++++

[stdlog] 2024-08-26 17:11:50,863 avocado.utils.process process L0658 INFO | Running 'pingum'
[stdlog] 2024-08-26 17:11:50,885 avocado.utils.process process L0472 DEBUG| [stdout] Class B n/w configured, thisid=194.229, mylastnib=229
[stdlog] 2024-08-26 17:11:50,885 avocado.utils.process process L0472 DEBUG| [stdout] Ping Com 9.40.194.229---->
[stdlog] 2024-08-26 17:11:50,886 avocado.utils.process process L0472 DEBUG| [stdout] OK
[stdlog] 2024-08-26 17:11:50,886 avocado.utils.process process L0472 DEBUG| [stdout] Ping Com 9.40.194.245---->
[stdlog] 2024-08-26 17:11:50,888 avocado.utils.process process L0472 DEBUG| [stdout] OK
[stdlog] 2024-08-26 17:11:50,888 avocado.utils.process process L0472 DEBUG| [stdout] Ping Test 101net194.245---->
[stdlog] 2024-08-26 17:11:50,890 avocado.utils.process process L0472 DEBUG| [stdout] OK
[stdlog] 2024-08-26 17:11:50,890 avocado.utils.process process L0472 DEBUG| [stdout] All networks ping Ok
[stdlog] 2024-08-26 17:11:50,891 avocado.utils.process process L0715 INFO | Command 'pingum' finished with 0 after 0.026501673s
[stdlog] 2024-08-26 17:11:50,891 avocado.test htx_nic_devices L0288 INFO | Running the HTX for net.mdt on Host
[stdlog] 2024-08-26 17:11:50,905 avocado.test htx_nic_devices L0292 INFO | HTXD is already running with PID: 3526. Killing it.
[stdlog] 2024-08-26 17:11:50,906 avocado.utils.process process L0658 INFO | Running 'pkill -f htxd'
[stdlog] 2024-08-26 17:11:50,928 avocado.utils.process process L0715 INFO | Command 'pkill -f htxd' finished with 0 after 0.008529288s
[stdlog] 2024-08-26 17:12:00,938 avocado.utils.process process L0658 INFO | Running 'htxcmdline -run -mdt net.mdt'
[stdlog] 2024-08-26 17:12:00,943 avocado.utils.process process L0472 DEBUG| [stderr] ERROR: while connecting hostname and port <3492>. Exiting...: Connection refused
[stdlog] 2024-08-26 17:12:00,943 avocado.utils.process process L0715 INFO | Command 'htxcmdline -run -mdt net.mdt' finished with 1 after 0.002794073s
[stdlog] 2024-08-26 17:12:00,943 avocado.test stacktrace L0041 ERROR|
[stdlog] 2024-08-26 17:12:00,943 avocado.test stacktrace L0043 ERROR| Reproduced traceback from: /usr/local/lib/python3.9/site-packages/avocado_framework-106.0-py3.9.egg/avocado/core/test.py:607
[stdlog] 2024-08-26 17:12:00,949 avocado.test stacktrace L0050 ERROR| Traceback (most recent call last):
[stdlog] 2024-08-26 17:12:00,949 avocado.test stacktrace L0050 ERROR| File "htx_nic_devices.py", line 245, in test_start
[stdlog] 2024-08-26 17:12:00,949 avocado.test stacktrace L0050 ERROR| self.run_htx()
[stdlog] 2024-08-26 17:12:00,949 avocado.test stacktrace L0050 ERROR| File "htx_nic_devices.py", line 285, in run_htx
[stdlog] 2024-08-26 17:12:00,949 avocado.test stacktrace L0050 ERROR| self.start_htx_run()
[stdlog] 2024-08-26 17:12:00,949 avocado.test stacktrace L0050 ERROR| File "htx_nic_devices.py", line 296, in start_htx_run
[stdlog] 2024-08-26 17:12:00,950 avocado.test stacktrace L0050 ERROR| process.run(cmd, shell=True, sudo=True)
[stdlog] 2024-08-26 17:12:00,950 avocado.test stacktrace L0050 ERROR| File "/usr/local/lib/python3.9/site-packages/avocado_framework-106.0-py3.9.egg/avocado/utils/process.py", line 1013, in run
[stdlog] 2024-08-26 17:12:00,950 avocado.test stacktrace L0050 ERROR| raise CmdError(cmd, sp.result)
[stdlog] 2024-08-26 17:12:00,950 avocado.test stacktrace L0050 ERROR| avocado.utils.process.CmdError: Command 'htxcmdline -run -mdt net.mdt' failed.
[stdlog] 2024-08-26 17:12:00,950 avocado.test stacktrace L0050 ERROR| stdout: b''
[stdlog] 2024-08-26 17:12:00,950 avocado.test stacktrace L0050 ERROR| stderr: b'ERROR: while connecting hostname and port <3492>. Exiting...: Connection refused\n'
[stdlog] 2024-08-26 17:12:00,950 avocado.test stacktrace L0050 ERROR| additional_info: None
[stdlog] 2024-08-26 17:12:00,950 avocado.test stacktrace L0051 ERROR|
[stdlog] 2024-08-26 17:12:00,950 avocado.test test L0611 DEBUG| Local variables:
[stdlog] 2024-08-26 17:12:00,982 avocado.test test L0614 DEBUG| -> self <class 'htx_nic_devices.HtxNicTest'>: 1-htx_nic_devices.py:HtxNicTest.test_start;run-012a
[stdlog] 2024-08-26 17:12:00,983 avocado.test test L0688 ERROR| Traceback (most recent call last):
[stdlog] 2024-08-26 17:12:00,983 avocado.test test L0688 ERROR| File "/usr/local/lib/python3.9/site-packages/avocado_framework-106.0-py3.9.egg/avocado/core/test.py", line 615, in _run_test
[stdlog] raise details
[stdlog] 2024-08-26 17:12:00,983 avocado.test test L0688 ERROR| File "/usr/local/lib/python3.9/site-packages/avocado_framework-106.0-py3.9.egg/avocado/core/test.py", line 602, in _run_test
[stdlog] testMethod()
[stdlog] 2024-08-26 17:12:00,983 avocado.test test L0688 ERROR| File "htx_nic_devices.py", line 245, in test_start
[stdlog] self.run_htx()
[stdlog] 2024-08-26 17:12:00,983 avocado.test test L0688 ERROR| File "htx_nic_devices.py", line 285, in run_htx
[stdlog] self.start_htx_run()
[stdlog] 2024-08-26 17:12:00,983 avocado.test test L0688 ERROR| File "htx_nic_devices.py", line 296, in start_htx_run
[stdlog] process.run(cmd, shell=True, sudo=True)
[stdlog] 2024-08-26 17:12:00,983 avocado.test test L0688 ERROR| File "/usr/local/lib/python3.9/site-packages/avocado_framework-106.0-py3.9.egg/avocado/utils/process.py", line 1013, in run
[stdlog] raise CmdError(cmd, sp.result)

This code has fix for HTX hung issue that causes by killing the
HTXD deamon during the setup, due to this HTX setup was existing
with "cannot connect to peer" Error.

Signed-off-by: Shaik Abdulla <[email protected]>
@FarooqAbdulla02
Copy link
Contributor Author

with latest openssh-9.8p1-3.el10.ppc64le and updated glibc,gcc and make packages

[root@ltczep10-lp1 net]# avocado run htx_nic_devices.py -m htx_nic_devices.py.data/htx_nic_devices.yaml --max-parallel-tasks=1
JOB ID : 42e85fe5a21bf516516406fe1073b042abc57b2c
JOB LOG : /home/OpTest/avocado-fvt-wrapper/results/job-2024-08-27T16.24-42e85fe/job.log
(1/3) htx_nic_devices.py:HtxNicTest.test_start;run-a9ff: STARTED
(1/3) htx_nic_devices.py:HtxNicTest.test_start;run-a9ff: PASS (319.06 s)
(2/3) htx_nic_devices.py:HtxNicTest.test_check;run-a9ff: STARTED
(2/3) htx_nic_devices.py:HtxNicTest.test_check;run-a9ff: PASS (124.84 s)
(3/3) htx_nic_devices.py:HtxNicTest.test_stop;run-a9ff: STARTED
(3/3) htx_nic_devices.py:HtxNicTest.test_stop;run-a9ff: PASS (11.34 s)
RESULTS : PASS 3 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML : /home/OpTest/avocado-fvt-wrapper/results/job-2024-08-27T16.24-42e85fe/results.html
JOB TIME : 496.02 s

with older openssh-8.7p1-43.el9.ppc64le and glibc, gcc and make files

[root@ltcden7-lp1-new net]# avocado run htx_nic_devices.py -m htx_nic_devices.py.data/htx_nic_devices.yaml --max-parallel-tasks=1
JOB ID : 0ce48d069e669c69c50995d5db0ba7ffe0f691cd
JOB LOG : /home/OpTest/avocado-fvt-wrapper/results/job-2024-08-27T07.55-0ce48d0/job.log
(1/3) htx_nic_devices.py:HtxNicTest.test_start;run-5181: STARTED

(1/3) htx_nic_devices.py:HtxNicTest.test_start;run-5181: PASS (125.15 s)
(2/3) htx_nic_devices.py:HtxNicTest.test_check;run-5181: STARTED
(2/3) htx_nic_devices.py:HtxNicTest.test_check;run-5181: PASS (123.07 s)
(3/3) htx_nic_devices.py:HtxNicTest.test_stop;run-5181: STARTED
(3/3) htx_nic_devices.py:HtxNicTest.test_stop;run-5181: PASS (7.68 s)
RESULTS : PASS 3 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML : /home/OpTest/avocado-fvt-wrapper/results/job-2024-08-27T07.55-0ce48d0/results.html
JOB TIME : 282.95 s

@abdhaleegit abdhaleegit self-assigned this Sep 2, 2024
@abdhaleegit
Copy link
Collaborator

@FarooqAbdulla02 please make sure it works on sles @shirishaganta does it looks good to you ?

@shirishaganta1
Copy link
Contributor

@FarooqAbdulla02 please make sure it works on sles @shirishaganta does it looks good to you ?

yes LGTM..

@abdhaleegit abdhaleegit self-requested a review September 2, 2024 05:18
if hxe_pid:
self.log.info("HXE is already running with PID: %s. Killing it.", hxe_pid)
process.run("hcl -shutdown", ignore_status=True)
time.sleep(20)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process.run takes a timeout value can you move sleep to timeout or this is needed ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also please check how this works for cfg run? one quick run would be good

@abdhaleegit abdhaleegit self-requested a review September 4, 2024 10:14
Copy link
Collaborator

@abdhaleegit abdhaleegit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, merging this as blocks CR

@abdhaleegit abdhaleegit merged commit b6543fe into avocado-framework-tests:master Sep 4, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants