Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exception handling for OSError that my be thrown by os.fsync #6019

Closed
wants to merge 1 commit into from

Conversation

smitterl
Copy link
Contributor

@smitterl smitterl commented Sep 3, 2024

We've seen instances where "Bad file descriptor" OSError was raised. This looks similar to 52d69be. Handle in the same way by ignoring the error and continuing execution.

We've seen instances where "Bad file descriptor" OSError was raised.
This looks similar to 52d69be.
Handle in the same way by ignoring the error and continuing execution.

Signed-off-by: Sebastian Mitterle <[email protected]>
@smitterl
Copy link
Contributor Author

smitterl commented Sep 3, 2024

Sample error trace:

2024-09-03 07:26:13,865 stacktrace       L0041 ERROR| Reproduced traceback from: /var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/core/test.py:778
2024-09-03 07:26:13,866 stacktrace       L0045 ERROR| Traceback (most recent call last):
2024-09-03 07:26:13,866 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado-vt/avocado_vt/test.py", line 175, in setUp
2024-09-03 07:26:13,866 stacktrace       L0045 ERROR|     if utils_package.package_install("tar"):
2024-09-03 07:26:13,866 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado-vt/virttest/utils_package.py", line 237, in package_install
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     return utils_misc.wait_for(mgr.install, timeout)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado-vt/virttest/utils_misc.py", line 581, in wait_for
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     output = func()
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado-vt/virttest/utils_package.py", line 190, in install
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self.func = super(LocalPackageMgr, self).__getattr__("install")
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/software_manager/manager.py", line 42, in __getattr__
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self._init_on_demand()
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/software_manager/manager.py", line 38, in _init_on_demand
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self.backend = backend()
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/software_manager/backends/dnf.py", line 21, in __init__
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     super(DnfBackend, self).__init__(cmd='dnf')
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/software_manager/backends/yum.py", line 45, in __init__
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self._set_version(cmd)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/software_manager/backends/yum.py", line 74, in _set_version
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     result = process.run(self.base_command + '--version',
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 1090, in run
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     cmd_result = sp.run(timeout=timeout)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 962, in run
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self.wait(timeout, sig)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 898, in wait
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self._fill_results(rc)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 742, in _fill_results
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self._fill_streams()
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 754, in _fill_streams
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self._stderr_drainer.flush()
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 521, in flush
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     os.fsync(fileno)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR| OSError: [Errno 9] Bad file descriptor

@richtja
Copy link
Contributor

richtja commented Sep 4, 2024

Hi @smitterl thank you for opening this PR. Even though your change would fix the error message, I don't think we should do that, because IMO it hides the real issue. From your error trace sample, it looks like there might be some issue in avocado.utils.software_manager or in avocado-vt.virttest.utils_package, because this error implicates that the process wasn't terminated properly. Is it possible to provide some reproducer for this? I would like to investigate it before we will continue with this PR. Thank you.

@smitterl
Copy link
Contributor Author

smitterl commented Sep 4, 2024

Hi @smitterl thank you for opening this PR. Even though your change would fix the error message, I don't think we should do that, because IMO it hides the real issue. From your error trace sample, it looks like there might be some issue in avocado.utils.software_manager or in avocado-vt.virttest.utils_package, because this error implicates that the process wasn't terminated properly. Is it possible to provide some reproducer for this? I would like to investigate it before we will continue with this PR. Thank you.

This only happens some times in our environments. When it happens, from what I can tell, it's after the test fails and avocado(-vt?) is tearing everything down. You could try and run some of the tp-libvirt tests several times

# avocado run --vt-type libvirt --test-runner=runner --vt-machine-type q35 virsh.migrate_network.ovs_br.without_postcopy --vt-connect-uri qemu:///system
# avocado run --vt-type libvirt --test-runner=runner --vt-machine-type s390-virtio migration.destructive_operations_around_live_migration.migration_kill_libvirt_daemon.autoconverge_option.kill_dest_virtproxyd.with_precopy.p2p --vt-connect-uri qemu:///system

I'll send you a full test log via email.

richtja added a commit to richtja/avocado-vt that referenced this pull request Sep 13, 2024
In tp_libvrt testing, We've seen many occurrences where "Bad file
descriptor" OSError was raised during clean up phase when an external
command is executed via `avocado.utils.process` utility. It happens when
`SubProcess` class wants to flush the stdout and stderr of the external
command after finishes. These kinds of errors might lead to stdout and
stderr data loss, but during the tp_libvrt testing it leads to false
positive failures, which makes test evaluation harder.

This commit introduces a new config variable `omit_data_loss` which,
when is enabled, will omit these errors, and they won't affect the
overall test result.

Reference: avocado-framework/avocado#6019
Signed-off-by: Jan Richter <[email protected]>
richtja added a commit to richtja/avocado-vt that referenced this pull request Sep 13, 2024
In tp_libvrt testing, We've seen many occurrences where "Bad file
descriptor" OSError was raised during clean up phase when an external
command is executed via `avocado.utils.process` utility. It happens when
`SubProcess` class wants to flush the stdout and stderr of the external
command after finishes. These kinds of errors might lead to stdout and
stderr data loss, but during the tp_libvrt testing it leads to false
positive failures, which makes test evaluation harder.

This commit introduces a new config variable `omit_data_loss` which,
when is enabled, will omit these errors, and they won't affect the
overall test result.

Reference: avocado-framework/avocado#6019
Signed-off-by: Jan Richter <[email protected]>
richtja added a commit to richtja/avocado-vt that referenced this pull request Sep 13, 2024
In tp_libvrt testing, We've seen many occurrences where "Bad file
descriptor" OSError was raised during clean up phase when an external
command is executed via `avocado.utils.process` utility. It happens when
`SubProcess` class wants to flush the stdout and stderr of the external
command after finishes. These kinds of errors might lead to stdout and
stderr data loss, but during the tp_libvrt testing it leads to false
positive failures, which makes test evaluation harder.

This commit introduces a new config variable `omit_data_loss` which,
when is enabled, will omit these errors, and they won't affect the
overall test result.

Reference: avocado-framework/avocado#6019
Signed-off-by: Jan Richter <[email protected]>
@smitterl
Copy link
Contributor Author

Closing in favor of avocado-framework/avocado-vt#4001

@smitterl smitterl closed this Oct 16, 2024
richtja added a commit to richtja/avocado-vt that referenced this pull request Oct 17, 2024
In tp_libvrt testing, We've seen many occurrences where "Bad file
descriptor" OSError was raised during clean up phase when an external
command is executed via `avocado.utils.process` utility. It happens when
`SubProcess` class wants to flush the stdout and stderr of the external
command after finishes. These kinds of errors might lead to stdout and
stderr data loss, but during the tp_libvrt testing it leads to false
positive failures, which makes test evaluation harder.

This commit introduces a new config variable `omit_data_loss` which,
when is enabled, will omit these errors, and they won't affect the
overall test result.

Reference: avocado-framework/avocado#6019
Signed-off-by: Jan Richter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done 109
Development

Successfully merging this pull request may close these issues.

2 participants