Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chassis][nokia][202405.17]: Chassisd crashes when updating moduledb #21131

Open
liamkearney-msft opened this issue Dec 11, 2024 · 2 comments · May be fixed by sonic-net/sonic-platform-daemons#573

Comments

@liamkearney-msft
Copy link
Contributor

Description

Chassisd fails on supervisor on Nokia 7250, leading to syncd/swss not starting up. Backtrace seen on the syslog is as follows:

2024 Dec 10 01:57:04.137945 svcstr-7250-sup-1 INFO pmon#supervisord 2024-12-10 01:57:04,137 INFO success: chassisd entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)

2024 Dec 10 01:57:09.544727 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd Traceback (most recent call last):

2024 Dec 10 01:57:09.544727 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd   File "/usr/local/bin/chassisd", line 710, in <module>

2024 Dec 10 01:57:09.545539 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd     main()

2024 Dec 10 01:57:09.545753 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd   File "/usr/local/bin/chassisd", line 705, in main

2024 Dec 10 01:57:09.545938 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd     chassisd.run()

2024 Dec 10 01:57:09.545949 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd   File "/usr/local/bin/chassisd", line 683, in run

2024 Dec 10 01:57:09.546348 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd     self.module_updater.module_db_update()

2024 Dec 10 01:57:09.546558 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd   File "/usr/local/bin/chassisd", line 279, in module_db_update

2024 Dec 10 01:57:09.546731 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd     if self.my_slot == int(module_info_dict['slot']):

2024 Dec 10 01:57:09.546891 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

2024 Dec 10 01:57:09.546902 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd ValueError: invalid literal for int() with base 10: 'A'

This is due to this PR : sonic-net/sonic-platform-daemons#560 , and the fact that nokia uses characters (eg. A, B, C etc.) for its slot name.

Steps to reproduce the issue:

  1. Start a nokia 7250 on 202405.17
  2. Note that syncd / swss does not start

Describe the results you received:

No syncd containers running

Describe the results you expected:

syncd starts as expected

Output of show version:

20240510.17

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@arlakshm
Copy link
Contributor

@mlok-nokia for viz...

@Javier-Tan
Copy link

Javier-Tan commented Dec 11, 2024

Just for documentation purposes I believe this causes iBGP neighbours not to come up so tests generally fail (when .17 is on sup card), whole host of other issues as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants