Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: configurable queue monitors for event buffers and thread pools #450

Merged
merged 19 commits into from
Feb 14, 2024

Conversation

viniarck
Copy link
Member

@viniarck viniarck commented Feb 8, 2024

Closes #439

Summary

  • See updated changelog file and kytos.conf.template for more specific information
  • The default config should works well out of the gate for our current kytos-ng NApps and AmLight's scalability network like (I simulated with 300 EVCs + a few link flaps)
  • Notice that with queue monitors the major goal is to detect high queuing usage over a delta t in seconds sampled each second, so we're not trying to have extremely granular visibility (telemetry like), but just to start detecting when on a per second scale if any queues of event buffers or the max workers of thread pools need to either be increased or if a NApp might be misbehaving and sending way too many events.

Local Tests

  • I tested the default config with 300 EVCs with some link flap and no warnings showed up as expected
  • I also explored three configs that will be described below, while also injecting a hundreds of concurrent events targeting a slow-ish handler to simulate a case where the queue of a thread pool would keep increasing significantly

Config a (default)

2024-02-08 10:27:48,058 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(msg_in, min_hits=5, min_size=512, delta_secs=5)...
2024-02-08 10:27:48,058 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(msg_out, min_hits=5, min_size=512, delta_secs=5)...
2024-02-08 10:27:48,058 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(raw, min_hits=5, min_size=512, delta_secs=5)...
2024-02-08 10:27:48,058 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(app, min_hits=5, min_size=1024, delta_secs=5)...
2024-02-08 10:27:48,058 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(threadpool_sb, min_hits=5, min_size=256, delta_secs=5)...
2024-02-08 10:27:48,058 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(threadpool_app, min_hits=5, min_size=512, delta_secs=5)...
2024-02-08 10:27:48,058 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(threadpool_db, min_hits=5, min_size=256, delta_secs=5)...



... after injecting too many events ... 


2024-02-08 10:29:34,182 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, counted: 5, min/avg/max size: 4608/10336.6/12958, first at: 2024-02-08 13:29:29.802628+00:00, last at: 2024-02-08 13:29:34.182469+00:00, delta secs: 5, min_hits: 5, min_size_threshold: 512
2024-02-08 10:29:39,188 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, counted: 5, min/avg/max size: 9392/10410.0/11428, first at: 2024-02-08 13:29:35.184075+00:00, last at: 2024-02-08 13:29:39.188669+00:00, delta secs: 5, min_hits: 5, min_size_threshold: 512
2024-02-08 10:29:44,195 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, counted: 5, min/avg/max size: 6838/7859.6/8880, first at: 2024-02-08 13:29:40.189934+00:00, last at: 2024-02-08 13:29:44.195025+00:00, delta secs: 5, min_hits: 5, min_size_threshold: 512

Config b

thread_pool_queue_monitors =
  [
    {
      "min_hits": 5,
      "delta_secs": 10,
      "min_queue_full_percent": 100,
      "log_at_most_n": 3,
      "queues": ["sb", "app", "db"]
    }
  ]


2024-02-08 10:26:03,160 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(msg_in, min_hits=5, min_size=512, delta_secs=5)...
2024-02-08 10:26:03,160 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(msg_out, min_hits=5, min_size=512, delta_secs=5)...
2024-02-08 10:26:03,160 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(raw, min_hits=5, min_size=512, delta_secs=5)...
2024-02-08 10:26:03,160 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(app, min_hits=5, min_size=1024, delta_secs=5)...
2024-02-08 10:26:03,160 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(threadpool_sb, min_hits=5, min_size=256, delta_secs=10)...
2024-02-08 10:26:03,160 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(threadpool_app, min_hits=5, min_size=512, delta_secs=10)...
2024-02-08 10:26:03,160 - INFO [kytos.core.controller] (MainThread) Starting QueueMonitor(threadpool_db, min_hits=5, min_size=256, delta_secs=10)...


... after injecting too many events ... 

2024-02-08 10:26:40,911 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, counted: 5, min/avg/max size: 2560/10130.0/13470, first at: 2024-02-08 13:26:36.561493+00:00, last at: 2024-02-08 13:26:40.911422+00:00, delta secs: 10, min_hits: 5, min_size_threshold: 512
2024-02-08 10:26:40,911 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, record[0]/[5]: size: 2560, at: 2024-02-08 13:26:36.561493+00:00
2024-02-08 10:26:40,911 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, record[2]/[5]: size: 13470, at: 2024-02-08 13:26:38.909429+00:00
2024-02-08 10:26:40,911 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, record[4]/[5]: size: 12446, at: 2024-02-08 13:26:40.911422+00:00

2024-02-08 10:26:50,927 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, counted: 10, min/avg/max size: 7350/9643.2/11940, first at: 2024-02-08 13:26:41.912263+00:00, last at: 2024-02-08 13:26:50.926982+00:00, delta secs: 10, min_hits: 5, min_size_threshold: 512
2024-02-08 10:26:50,927 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, record[0]/[10]: size: 11940, at: 2024-02-08 13:26:41.912263+00:00
2024-02-08 10:26:50,927 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, record[4]/[10]: size: 9898, at: 2024-02-08 13:26:45.919824+00:00
2024-02-08 10:26:50,927 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, record[8]/[10]: size: 7856, at: 2024-02-08 13:26:49.925610+00:00

Config c

This (temporary) config can be useful when you just want to see if every second if there's at least 1 event being queued, this can be useful to give you an idea of how busy the queues are in a local stress test scenario for instance, which can help you to identify certain base line usage and/or spiky queue usage loads in a particular case:

thread_pool_queue_monitors =
  [
    {
      "min_hits": 1,
      "delta_secs": 1,
      "min_queue_full_percent": 0,
      "log_at_most_n": 0,
      "queues": ["sb", "app", "db"]
    }
  ]


2024-02-08 10:34:10,705 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, counted: 1, min/avg/max size: 2488/2488.0/2488, first at: 2024-02-08 13:34:10.705708+00:00, last at: 2024-02-08 13:34:10.705708+00:00, delta secs: 1, min_hits: 1, min_size_threshold: 1
2024-02-08 10:34:11,707 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, counted: 1, min/avg/max size: 1976/1976.0/1976, first at: 2024-02-08 13:34:11.707078+00:00, last at: 2024-02-08 13:34:11.707078+00:00, delta secs: 1, min_hits: 1, min_size_threshold: 1
2024-02-08 10:34:12,710 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, counted: 1, min/avg/max size: 1464/1464.0/1464, first at: 2024-02-08 13:34:12.709970+00:00, last at: 2024-02-08 13:34:12.709970+00:00, delta secs: 1, min_hits: 1, min_size_threshold: 1
2024-02-08 10:34:13,711 - WARNING [kytos.core.queue_monitor] (MainThread) threadpool_app, counted: 1, min/avg/max size: 958/958.0/958, first at: 2024-02-08 13:34:13.711444+00:00, last at: 2024-02-08 13:34:13.711444+00:00, delta secs: 1, min_hits: 1, min_size_threshold: 1

End-to-End Tests

============================= test session starts ==============================
platform linux -- Python 3.9.2, pytest-7.2.0, pluggy-1.4.0
rootdir: /builds/amlight/kytos-end-to-end-tester/kytos-end-to-end-tests
plugins: rerunfailures-10.2, timeout-2.1.0, anyio-3.6.2
collected 257 items
tests/test_e2e_01_kytos_startup.py ..                                    [  0%]
tests/test_e2e_05_topology.py ....................                       [  8%]
tests/test_e2e_10_mef_eline.py ..........ss.....x.....x................  [ 24%]
tests/test_e2e_11_mef_eline.py ......                                    [ 26%]
tests/test_e2e_12_mef_eline.py .....Xx.                                  [ 29%]
tests/test_e2e_13_mef_eline.py ....Xs.s.....Xs.s.XXxX.xxxx..X........... [ 45%]
.                                                                        [ 45%]
tests/test_e2e_14_mef_eline.py x                                         [ 46%]
tests/test_e2e_15_mef_eline.py .....                                     [ 48%]
tests/test_e2e_16_mef_eline.py .                                         [ 48%]
tests/test_e2e_20_flow_manager.py .....................                  [ 56%]
tests/test_e2e_21_flow_manager.py ...                                    [ 57%]
tests/test_e2e_22_flow_manager.py ...............                        [ 63%]
tests/test_e2e_23_flow_manager.py ..............                         [ 69%]
tests/test_e2e_30_of_lldp.py ....                                        [ 70%]
tests/test_e2e_31_of_lldp.py ...                                         [ 71%]
tests/test_e2e_32_of_lldp.py ...                                         [ 73%]
tests/test_e2e_40_sdntrace.py ..............                             [ 78%]
tests/test_e2e_41_kytos_auth.py ........                                 [ 81%]
tests/test_e2e_42_sdntrace.py ..                                         [ 82%]
tests/test_e2e_50_maintenance.py ........................                [ 91%]
tests/test_e2e_60_of_multi_table.py .....                                [ 93%]
tests/test_e2e_70_kytos_stats.py ........                                [ 96%]
tests/test_e2e_80_pathfinder.py ss......                                 [100%]
=============================== warnings summary ===============================
------------------------------- start/stop times -------------------------------
= 233 passed, 8 skipped, 9 xfailed, 7 xpassed, 1143 warnings in 12325.92s (3:25:25) =

@viniarck viniarck requested a review from a team as a code owner February 8, 2024 14:33
@viniarck viniarck merged commit 3147d4a into master Feb 14, 2024
2 checks passed
@viniarck viniarck deleted the feat/queue_mon branch February 14, 2024 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: core queue and thread pool queue capacity (utilization) task monitor
1 participant