Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[eventd] Add incremental polling when waiting for capture service to start #18138

Merged
merged 6 commits into from
Mar 15, 2024

Conversation

zbud-msft
Copy link
Contributor

@zbud-msft zbud-msft commented Feb 20, 2024

Why I did it

Addresses #17350

Work item tracking
  • Microsoft ADO (number only):27048859

How I did it

Instead of a 1 second delay, we poll to check that the thread is available and after each poll increment the delay. There were situations where if there was less memory available, fixed polling would not be effective for starting zmq capture service. Add an incremental delay such that eventd can wait longer to start up capture service if system is too busy or overloaded, but still keep a max duration/retry limit so that we do not wait forever.

How to verify it

UT

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@dgsudharsan
Copy link
Collaborator

@zbud-msft Can you please check if we can cherrypick this fix to 202305? If not can you pelase create a separate PR?

@@ -547,8 +548,9 @@ capture_service::set_control(capture_control_t ctrl, event_serialized_lst_t *lst
case INIT_CAPTURE:
m_thr = thread(&capture_service::do_capture, this);
for(int i=0; !m_cap_run && (i < CAPTURE_SERVICE_POLLING_RETRIES); ++i) {
/* Wait max a second for thread to init */
this_thread::sleep_for(chrono::milliseconds(CAPTURE_SERVICE_POLLING_DURATION));
/* Poll to see if thread has been init, if so exit early. Add delay on every attempt */
Copy link
Collaborator

@qiluo-msft qiluo-msft Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poll

It reduces the possibility of the issue happening, and it is not a complete fix. Then we need to handle the issue as a warning, not as an error. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added logic that eventd will not exit when capture service fails to initialize; warning logs are added, and main proxy/zmq service will continue to run.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closed.

@StormLiangMS
Copy link
Contributor

hi @qiluo-msft @zbud-msft could you update the ADO? Is this a must to have for 202305?

@dgsudharsan
Copy link
Collaborator

@zbud-msft This PR has conflict with 202305. Can you please check and if required create a separate PR for 202305?

@zbud-msft
Copy link
Contributor Author

@zbud-msft This PR has conflict with 202305. Can you please check and if required create a separate PR for 202305?

@dgsudharsan Currently depending on #18249

@qiluo-msft qiluo-msft merged commit a53cbdc into sonic-net:master Mar 15, 2024
19 checks passed
@dgsudharsan
Copy link
Collaborator

@yxieca Can you please help to cherry-pick this fix to 202311?

mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Apr 3, 2024
…start (sonic-net#18138)

### Why I did it

Addresses sonic-net#17350

### How I did it

Instead of a 1 second delay, we poll to check that the thread is available and after each poll increment the delay. There were situations where if there was less memory available, fixed polling would not be effective for starting zmq capture service. Add an incremental delay such that eventd can wait longer to start up capture service if system is too busy or overloaded, but still keep a max duration/retry limit so that we do not wait forever.

#### How to verify it
UT
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202311: #18552

mssonicbld pushed a commit that referenced this pull request Apr 4, 2024
…start (#18138)

### Why I did it

Addresses #17350

### How I did it

Instead of a 1 second delay, we poll to check that the thread is available and after each poll increment the delay. There were situations where if there was less memory available, fixed polling would not be effective for starting zmq capture service. Add an incremental delay such that eventd can wait longer to start up capture service if system is too busy or overloaded, but still keep a max duration/retry limit so that we do not wait forever.

#### How to verify it
UT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants