Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chassisd] Address the chassisd crash issue and add UT for it #573

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions sonic-chassisd/scripts/chassisd
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ class ModuleUpdater(logger.Logger):

for module_index in range(0, self.num_modules):
module_info_dict = self._get_module_info(module_index)
if self.my_slot == int(module_info_dict['slot']):
if self.my_slot == module_info_dict['slot']:
my_index = module_index

if module_info_dict is not None:
Expand All @@ -300,7 +300,7 @@ class ModuleUpdater(logger.Logger):

fvs = swsscommon.FieldValuePairs([(CHASSIS_MODULE_INFO_DESC_FIELD, module_info_dict[CHASSIS_MODULE_INFO_DESC_FIELD]),
(CHASSIS_MODULE_INFO_SLOT_FIELD,
module_info_dict[CHASSIS_MODULE_INFO_SLOT_FIELD]),
str(module_info_dict[CHASSIS_MODULE_INFO_SLOT_FIELD])),
(CHASSIS_MODULE_INFO_OPERSTATUS_FIELD, module_info_dict[CHASSIS_MODULE_INFO_OPERSTATUS_FIELD]),
(CHASSIS_MODULE_INFO_NUM_ASICS_FIELD, str(len(module_info_dict[CHASSIS_MODULE_INFO_ASICS]))),
(CHASSIS_MODULE_INFO_SERIAL_FIELD, module_info_dict[CHASSIS_MODULE_INFO_SERIAL_FIELD])])
Expand Down Expand Up @@ -357,8 +357,8 @@ class ModuleUpdater(logger.Logger):

# In line card push the hostname of the module and num_asics to the chassis state db.
# The hostname is used as key to access chassis app db entries
module_info_dict = self._get_module_info(my_index)
if not self._is_supervisor():
module_info_dict = self._get_module_info(my_index)
hostname_key = "{}{}".format(ModuleBase.MODULE_TYPE_LINE, int(self.my_slot) - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation here? This makes an assumption that line card slots must be numbers, but should that be a requirement? Should this be using my_index instead? Or should there be some logic to handle if the LC slot was non-numeric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the self._get_module_info() call inside if condition since my_slot is only use inside. Not need to call self._get_module_info(0 if it not required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is still requiring that line cards must be named as numbers, only supervisors are allowed to be alphanumeric. Is that expected and required? Why can't alphanumeric line card names also be supported since this was already found to be a limitation of supervisor naming?

Copy link
Contributor Author

@mlok-nokia mlok-nokia Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Linecard slot must be number string or number. It is required. Otherwise, we will not able construct the linecard name. With the existing implementation, Linecard slot is a number or a number string, it will work. BTW, this change just moved the _get_module_info() call into "if" section since we only use it inside the "if".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line card name could be constructed if the slot wasn't numeric (like the supervisor) as well, the name is still just a string. I'm not sure what else relies on line card slots being numeric though, so this should be fine for now. It may be worth looking into what changes would be required to support alphanumeric slots for line cards as well, but not required for this change.

hostname = try_get(device_info.get_hostname, default="None")
hostname_fvs = swsscommon.FieldValuePairs([(CHASSIS_MODULE_INFO_SLOT_FIELD, str(self.my_slot)),
Expand Down Expand Up @@ -393,7 +393,7 @@ class ModuleUpdater(logger.Logger):

module_info_dict[CHASSIS_MODULE_INFO_NAME_FIELD] = name
module_info_dict[CHASSIS_MODULE_INFO_DESC_FIELD] = str(desc)
module_info_dict[CHASSIS_MODULE_INFO_SLOT_FIELD] = str(slot)
module_info_dict[CHASSIS_MODULE_INFO_SLOT_FIELD] = slot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be an unexpected behavior change. slot can still be an integer, so module_info_dict[CHASSIS_MODULE_INFO_SLOT_FIELD] can now have varying types. Previously when slot was always int, all logic was expecting this field to be str` since it was converted here (though I'm not sure any logic relies on that type). It may be best to revert this change to maintain a consistent (and existing) type requirement.

I do see there is a conversion for one of the usages of this dict entry to str to account for this, but it may just be best to maintain the expectation that this dict entry is always str from this point, as it was previously.

Copy link
Contributor Author

@mlok-nokia mlok-nokia Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fundamental issue is that a slot value on a chassis may not be a number, it could be a string -- such as Nokia IXRE7250 Supervisor card which is slot "A".
It is not necessary to convert slot value to a string in the module_info_dict[] here and then convert it back to a number when use it later. We can directly use it as what function get_slot() returned. For consistency, the only place is needed to be converted to a string when it is stored into a STATE_DB. This change simplfy it and no need to convert it back and forth (StringToNumber or NumberToString) when use the slot info in module_info_dict[]. This change will address this issue forever. Otherwise, another solution could be a big change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. My concern was mostly that it may be unclear to developers that the type stored in this dict field can vary. Functionally it seems fine. The documentation of APIs such as get_slot and get_my_slot should be updated as @judyjoseph previously mentioned; they not need not return only an int, they may also return str.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. My concern was mostly that it may be unclear to developers that the type stored in this dict field can vary. Functionally it seems fine. The documentation of APIs such as get_slot and get_my_slot should be updated as @judyjoseph previously mentioned; they not need not return only an int, they may also return str.

Agree. For the long run, this needs to be addressed.

module_info_dict[CHASSIS_MODULE_INFO_OPERSTATUS_FIELD] = str(status)
module_info_dict[CHASSIS_MODULE_INFO_ASICS] = asics
module_info_dict[CHASSIS_MODULE_INFO_SERIAL_FIELD] = str(serial)
Expand Down
48 changes: 48 additions & 0 deletions sonic-chassisd/tests/test_chassisd.py
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,54 @@ def test_configupdater_check_num_modules():
fvs = module_updater.chassis_table.get(CHASSIS_INFO_KEY_TEMPLATE.format(1))
assert fvs == None

def test_moduleupdater_check_string_slot():
chassis = MockChassis()

#Supervisor
index = 0
name = "SUPERVISOR0"
desc = "Supervisor card"
slot = "A"
serial = "RP1000101"
module_type = ModuleBase.MODULE_TYPE_SUPERVISOR
supervisor = MockModule(index, name, desc, module_type, slot, serial)
supervisor.set_midplane_ip()
chassis.module_list.append(supervisor)

#Linecard
index = 1
name = "LINE-CARD0"
desc = "36 port 400G card"
slot = "1"
serial = "LC1000101"
module_type = ModuleBase.MODULE_TYPE_LINE
module = MockModule(index, name, desc, module_type, slot, serial)
module.set_midplane_ip()
chassis.module_list.append(module)

#Fabric-card
index = 1
name = "FABRIC-CARD0"
desc = "Switch fabric card"
slot = "17"
serial = "FC1000101"
module_type = ModuleBase.MODULE_TYPE_FABRIC
fabric = MockModule(index, name, desc, module_type, slot, serial)
chassis.module_list.append(fabric)

#Run on supervisor
module_updater = ModuleUpdater(SYSLOG_IDENTIFIER, chassis, slot,
module.supervisor_slot)
module_updater.supervisor_slot = supervisor.get_slot()
module_updater.my_slot = supervisor.get_slot()
module_updater.modules_num_update()
module_updater.module_db_update()
module_updater.check_midplane_reachability()

midplane_table = module_updater.midplane_table
#Check only one entry in database
assert 1 == midplane_table.size()

def test_midplane_presence_modules():
chassis = MockChassis()

Expand Down
Loading