Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: set default maxTimeMS when creating an index; unhandled exc on setup() will exit kytosd #414

Merged
merged 9 commits into from
Sep 27, 2023

Conversation

viniarck
Copy link
Member

@viniarck viniarck commented Sep 26, 2023

Closes #413 (check out the last comments there for more information)
Partly addresses #314

Summary

See updated changelog file

Local Tests

  • Simulated a Mongo OperationFailure (equivalent to index creation op error), confirmed the retries also worked as expected, and confirmed the traceback was included (this will generate some extra information, but when this happens, sometimes it can be an actual bug, so the traceback will be needed for reporting a bug):
Web update - Web UI was not updated
2023-09-26 14:28:47,620 - INFO [kytos.core.db] (MainThread) Starting DB connection
2023-09-26 14:28:47,621 - INFO [kytos.core.db] (MainThread) Trying to run 'hello' command on MongoDB...
2023-09-26 14:28:47,631 - INFO [kytos.core.db] (MainThread) Ran 'hello' command on MongoDB successfully. It's ready!
2023-09-26 14:28:47,631 - WARNING [kytos.core.retry] (MainThread) Retry #1 for bootstrap_index, args: (<class 'kytos.core.db.Mongo'>, 'users', [('username', 1)]), kwargs: {'unique': Tru
e}, seconds since start: 0.00
2023-09-26 14:28:48,192 - WARNING [kytos.core.retry] (MainThread) Retry #2 for bootstrap_index, args: (<class 'kytos.core.db.Mongo'>, 'users', [('username', 1)]), kwargs: {'unique': Tru
e}, seconds since start: 0.56
Kytos couldn't start because of some forced OperationFailure Traceback (most recent call last):
  File "/home/viniarck/repos/kytos/kytos/core/controller.py", line 260, in start
    self.start_auth()
  File "/home/viniarck/repos/kytos/kytos/core/controller.py", line 165, in start_auth
    self.auth = Auth(self)
  File "/home/viniarck/repos/kytos/kytos/core/auth.py", line 202, in __init__
    self.user_controller.bootstrap_indexes()
  File "/home/viniarck/repos/kytos/.direnv/python-3.9/lib/python3.9/site-packages/tenacity/__init__.py", line 324, in wrapped_f
    return self(f, *args, **kw)
  File "/home/viniarck/repos/kytos/.direnv/python-3.9/lib/python3.9/site-packages/tenacity/__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/viniarck/repos/kytos/.direnv/python-3.9/lib/python3.9/site-packages/tenacity/__init__.py", line 349, in iter
    return fut.result()
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/home/viniarck/repos/kytos/.direnv/python-3.9/lib/python3.9/site-packages/tenacity/__init__.py", line 407, in __call__
    result = fn(*args, **kwargs)
  File "/home/viniarck/repos/kytos/kytos/core/retry.py", line 25, in decorated
    return func(*args, **kwargs)
  File "/home/viniarck/repos/kytos/kytos/core/auth.py", line 96, in bootstrap_indexes
    if self.mongo.bootstrap_index(collection, keys, **kwargs):
  File "/home/viniarck/repos/kytos/.direnv/python-3.9/lib/python3.9/site-packages/tenacity/__init__.py", line 324, in wrapped_f
    return self(f, *args, **kw)
  File "/home/viniarck/repos/kytos/.direnv/python-3.9/lib/python3.9/site-packages/tenacity/__init__.py", line 404, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/viniarck/repos/kytos/.direnv/python-3.9/lib/python3.9/site-packages/tenacity/__init__.py", line 360, in iter
    raise retry_exc.reraise()
  File "/home/viniarck/repos/kytos/.direnv/python-3.9/lib/python3.9/site-packages/tenacity/__init__.py", line 193, in reraise
    raise self.last_attempt.result()
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/home/viniarck/repos/kytos/.direnv/python-3.9/lib/python3.9/site-packages/tenacity/__init__.py", line 407, in __call__
    result = fn(*args, **kwargs)
  File "/home/viniarck/repos/kytos/kytos/core/db.py", line 121, in bootstrap_index
    raise OperationFailure("some forced OperationFailure")
pymongo.errors.OperationFailure: some forced OperationFailure

Task was destroyed but it is pending!
task: <Task pending name='Task-1' coro=<start_shell_async() running at /home/viniarck/repos/kytos/kytos/core/kytosd.py:120>>
sys:1: RuntimeWarning: coroutine 'start_shell_async' was never awaited
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
  • Simulated a generic error, undefined variable on flow_manager, noticed that the traeback was chained correctly just so you can trace it completely:
Web update - Web UI was not updated
2023-09-26 14:35:31,877 - INFO [kytos.core.db] (MainThread) Starting DB connection
2023-09-26 14:35:31,878 - INFO [kytos.core.db] (MainThread) Trying to run 'hello' command on MongoDB...
2023-09-26 14:35:31,889 - INFO [kytos.core.db] (MainThread) Ran 'hello' command on MongoDB successfully. It's ready!
2023-09-26 14:35:31,901 - INFO [kytos.core.api_server] (MainThread) Started /api/kytos/core/auth/login/ - GET
2023-09-26 14:35:31,901 - INFO [kytos.core.api_server] (MainThread) Started /api/kytos/core/auth/users/ - GET
2023-09-26 14:35:31,901 - INFO [kytos.core.api_server] (MainThread) Started /api/kytos/core/auth/users/{username} - GET
2023-09-26 14:35:31,901 - INFO [kytos.core.api_server] (MainThread) Started /api/kytos/core/auth/users/ - POST
2023-09-26 14:35:31,901 - INFO [kytos.core.api_server] (MainThread) Started /api/kytos/core/auth/users/{username} - DELETE
2023-09-26 14:35:31,901 - INFO [kytos.core.api_server] (MainThread) Started /api/kytos/core/auth/users/{username} - PATCH
2023-09-26 14:35:31,901 - INFO [kytos.core.controller] (MainThread) /home/viniarck/repos/kytos/.direnv/python-3.9/var/run/kytos
2023-09-26 14:35:31,901 - INFO [kytos.core.controller] (MainThread) Starting Kytos - Kytos Controller
2023-09-26 14:35:31,903 - INFO [kytos.core.controller] (MainThread) Starting TCP server: <kytos.core.atcp_server.KytosServer object at 0x7f540d51d490>
2023-09-26 14:35:31,903 - INFO [kytos.core.atcp_server] (MainThread) Kytos listening at 0.0.0.0:6653
2023-09-26 14:35:31,903 - INFO [kytos.core.controller] (MainThread) Loading Kytos NApps...
2023-09-26 14:35:31,905 - INFO [kytos.core.napps.napp_dir_listener] (MainThread) NAppDirListener Started...
2023-09-26 14:35:31,907 - INFO [kytos.core.controller] (MainThread) Loading NApp kytos/flow_manager
Kytos couldn't start because of KytosNAppSetupException: NApp kytos/flow_manager exception name 'undef_var' is not defined  Traceback (most recent call last):
  File "/home/viniarck/repos/kytos/kytos/core/controller.py", line 863, in load_napp
    napp = napp_module.Main(controller=self)
  File "/home/viniarck/repos/kytos/kytos/core/napps/base.py", line 194, in __init__
    self.setup()
  File "/home/viniarck/repos/napps/napps/kytos/flow_manager/main.py", line 90, in setup
    undef_var
NameError: name 'undef_var' is not defined

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/viniarck/repos/kytos/kytos/core/controller.py", line 265, in start
    self.start_controller()
  File "/home/viniarck/repos/kytos/kytos/core/controller.py", line 359, in start_controller
    self.load_napps()
  File "/home/viniarck/repos/kytos/kytos/core/controller.py", line 898, in load_napps
    self.load_napp(napp.username, napp.name)
  File "/home/viniarck/repos/kytos/kytos/core/controller.py", line 866, in load_napp
    raise KytosNAppSetupException(msg) from exc
kytos.core.exceptions.KytosNAppSetupException: KytosNAppSetupException: NApp kytos/flow_manager exception name 'undef_var' is not defined 

Task was destroyed but it is pending!
task: <Task pending name='Task-1' coro=<start_shell_async() running at /home/viniarck/repos/kytos/kytos/core/kytosd.py:120>>
2023-09-26 14:35:32,041 - ERROR [kytos.core.controller] (MainThread) 1
2023-09-26 14:35:32,041 - INFO [kytos.core.controller] (MainThread) Shutting down Kytos...
/home/viniarck/repos/kytos/kytos/core/kytosd.py:98: RuntimeWarning: coroutine 'start_shell_async' was never awaited
  async_main(config)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

End-to-End Tests

I'll dispatch an exec on GitLab, I'll post the results here later:

============================= test session starts ==============================
platform linux -- Python 3.9.2, pytest-7.2.0, pluggy-1.3.0
rootdir: /builds/amlight/kytos-end-to-end-tester/kytos-end-to-end-tests
plugins: rerunfailures-10.2, timeout-2.1.0, anyio-3.6.2
collected 242 items
tests/test_e2e_01_kytos_startup.py ..                                    [  0%]
tests/test_e2e_05_topology.py ..................                         [  8%]
tests/test_e2e_10_mef_eline.py ..........ss.....x.....x................  [ 24%]
tests/test_e2e_11_mef_eline.py ......                                    [ 27%]
tests/test_e2e_12_mef_eline.py .....Xx.                                  [ 30%]
tests/test_e2e_13_mef_eline.py .....xs.s......xs.s.XXxX.xxxx..X......... [ 47%]
...                                                                      [ 48%]
tests/test_e2e_14_mef_eline.py x                                         [ 49%]
tests/test_e2e_15_mef_eline.py ..                                        [ 50%]
tests/test_e2e_20_flow_manager.py .....................                  [ 58%]
tests/test_e2e_21_flow_manager.py ...                                    [ 59%]
tests/test_e2e_22_flow_manager.py ...............                        [ 66%]
tests/test_e2e_23_flow_manager.py ..............                         [ 71%]
tests/test_e2e_30_of_lldp.py ....                                        [ 73%]
tests/test_e2e_31_of_lldp.py ...                                         [ 74%]
tests/test_e2e_32_of_lldp.py ...                                         [ 76%]
tests/test_e2e_40_sdntrace.py .............                              [ 81%]
tests/test_e2e_41_kytos_auth.py ........                                 [ 84%]
tests/test_e2e_50_maintenance.py ........................                [ 94%]
tests/test_e2e_60_of_multi_table.py .....                                [ 96%]
tests/test_e2e_70_kytos_stats.py ........                                [100%]
=============================== warnings summary ===============================
= 220 passed, 6 skipped, 11 xfailed, 5 xpassed, 867 warnings in 11472.15s (3:11:12) =

@viniarck viniarck requested a review from a team as a code owner September 26, 2023 17:43
Copy link

@italovalcy italovalcy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just left a comment regarding print (not sure it was on purpose). Once the end-to-end tests finish, hopefully with good results, you have my approval.

except Exception as exc:
exc_fmt = traceback.format_exc(chain=True)
message = f"Kytos couldn't start because of {str(exc)} {exc_fmt}"
print(message)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to double check if this should be print instead of self.log.critical

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @italovalcy, yes, initially it was self.log.exception (relatively similar to self.log.critical), but I had to revert it on f4432f9. In summary, at this point here, since we're using QueueListener by default and it's threaded, it might end up not ready yet, so even a self.log.listener.stop() doesn't guarantee that it'd print out correctly (so it might lose the message before emiting the event or processing it). So, print ensure it's deterministic here, similarly to other parts of the code were either the log isn't ready or setup yet https://github.com/kytos-ng/kytos/blob/master/kytos/core/controller.py#L175

# pylint: disable=broad-except
try:
if self.options.database:
self.db_conn_or_core_shutdown()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's room to also allow both db_conn_or_core_shutdown and init_apm_or_core_shutdown to reraise as mentioned on #415, but I'll only address in the feature since it's just a chore, and that way it also minimizes the chunks of code that need to be backported.

@viniarck
Copy link
Member Author

Thanks for your review, Italo. e2e tests have passed with this branch. I'll merge it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

placeholder to simulate MongoDB RS node failures before index creation
2 participants