-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[System ready] Extend sysmonitor functionality to wait for host daemons #18817
base: master
Are you sure you want to change the base?
Changes from 9 commits
6e975c3
376a4d1
e50e33c
bcd3883
592dcb7
981243a
85ca8a9
39d1e8d
de110cf
c210287
544dfa2
7f21c07
8dc3b19
a9b865e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -79,11 +79,17 @@ | |
"auto_restart": "{{autorestart}}", | ||
"support_syslog_rate_limit" : "true", | ||
{# Set check_up_status to true here when app readiness will be marked in state db #} | ||
{%- if feature in ["swss", "syncd", "pmon"] %} | ||
"check_up_status" : "true", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please do not change image config generation behavior. We will rely on external service to generate golden config, and ask sonic to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wen587 to review. |
||
{%- endif %} | ||
{# For now, to support the infrastrucure, setting the check_up_status to false for bgp,swss,pmon #} | ||
{# Once apps like bgp,synd supports app readiness, then bgp,syncd can set check_up_status to true #} | ||
{%- if feature in ["bgp", "swss", "pmon"] %} | ||
{%- if feature in ["bgp"] %} | ||
"check_up_status" : "false", | ||
{%- endif %} | ||
{%- if feature in ["snmp"] %} | ||
"irrel_for_sysready" : "true", | ||
fastiuk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{%- endif %} | ||
{%- if include_kubernetes == "y" %} | ||
{%- if feature in ["lldp", "pmon", "radv", "eventd", "snmp", "telemetry", "gnmi"] %} | ||
"set_owner": "kube", {% else %} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,6 +44,8 @@ function startplatform() { | |
/usr/bin/mlnx-fw-upgrade.sh -v | ||
if [[ "$?" -ne "${EXIT_SUCCESS}" ]]; then | ||
debug "Failed to upgrade fw. " "$?" "Restart syncd" | ||
sonic-db-cli STATE_DB HSET "FEATURE|$DEV_SRV" fail_reason \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@dgsudharsan not only, we have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My question is syncd up_status is just set base on startplatform completion. Is it possible that in the actual syncd flow, switch_create might fail and we will still see the up_status as true. Can we include the switch create in the definition of syncd up_status? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it can happen I think. Set system ready condition you see here is just an internal NV requirement, but at the same time it is very generic, so won't conflict with any other platform. I prefer to leave it as it is. |
||
"ASIC FW update failed" up_status false | ||
exit 1 | ||
fi | ||
/etc/init.d/sxdkernel restart | ||
|
@@ -90,6 +92,9 @@ function startplatform() { | |
if [[ x"$sonic_asic_platform" == x"nvidia-bluefield" ]]; then | ||
/usr/bin/bfnet.sh start | ||
fi | ||
|
||
sonic-db-cli STATE_DB HDEL "FEATURE|$DEV_SRV" fail_reason | ||
sonic-db-cli STATE_DB HSET "FEATURE|$DEV_SRV" up_status true | ||
} | ||
|
||
function waitplatform() { | ||
|
@@ -175,9 +180,11 @@ LOCKFILE="/tmp/swss-syncd-lock$DEV" | |
NAMESPACE_PREFIX="asic" | ||
if [ "$DEV" ]; then | ||
NET_NS="$NAMESPACE_PREFIX$DEV" #name of the network namespace | ||
DEV_SRV="$SERVICE@$DEV" | ||
SONIC_DB_CLI="sonic-db-cli -n $NET_NS" | ||
else | ||
NET_NS="" | ||
DEV_SRV="$SERVICE" | ||
SONIC_DB_CLI="sonic-db-cli" | ||
fi | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgsudharsan It is disabled for snmp by default: see
files/build_templates/init_cfg.json.j2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is different. In the init_cfg.j2 you are considering snmp as irrelevent for system ready but using system ready as a criteria to spawn snmp. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you consider handling this logic through featured? Today we have a similar flow in featured where certain daemons are spawned after portinitdone. In this case the critera can be systemd. It can be much easy to extend this to multiple daemons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is indeed correct. Irrelevant for system ready means that
system ready
feature won't just collect readiness status of snmp systemd service. But SNMP must be started after system is ready. The logic actually does't break each other.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the code in
featured
and I don't want to break anything there by adding wait for system ready. Most of thedelayed
services there depend on portinit done, and I don't want to add another wait point there as it will just make the logic there more complex and less clear.