-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[xcvrd] DPB support on platforms with CmisManagerTask enabled #500
base: master
Are you sure you want to change the base?
Conversation
@ishidawataru Can you please fix the code coverage check failure? |
@mihirpat1 I'm still communicating with the original reporter of this issue to check the fix actually works. After that I'll fix the code coverage failure. |
83ad867
to
a5dd3e8
Compare
@mihirpat1 Confirmed that the fix resolved the issue. Also, I fixed the coverage test failure. Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ishidawataru There is not enough information for me to understand the problem. As long as CONFIG_DB port table has the right number of lanes and Speed specified, Xcvrd must be able to select the best matching application from the list of advertised application by the module. Do you see any issue in DPB port deletion path or DPB port creation path?
@prgeor This PR fixes sonic-net/sonic-buildimage#18893. It is not about the application selection. The current |
3b94fdb
to
6c839d3
Compare
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
@@ -896,7 +897,13 @@ def on_port_update_event(self, port_change_event): | |||
|
|||
self.force_cmis_reinit(lport, 0) | |||
else: | |||
# PORT_DEL event for the same lport happens 3 times because | |||
# we are subscribing to CONFIG_DB, STATE_DB|TRANSCEIVER_INFO, and STATE_DB|PORT_TABLE. | |||
# We only handle the first one and ignore the rest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introducing a new attribute deleted_ports
and its corresponding update logic, you may be able to simply rely on port_change_event's existing db_name
(/table_name
) to only proceed to do update_port_transceiver_status_table_sw_cmis_state
for the case of STATE_DB|TRANSCEIVER_INFO (since update_port_transceiver_status_table_sw_cmis_state
is updating SW state of transceiver for the case of transceiver removal)?
sonic-platform-daemons/sonic-xcvrd/xcvrd/xcvrd_utilities/port_event_helper.py
Lines 30 to 31 in bf865c6
self.db_name = db_name | |
self.table_name = table_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@longhuan-cisco Thank you for the suggestion. I was able to fix the issue without introducing deleted_ports
.
I initially added deleted_ports
because the CmisManagerTask handles port events in two stages.
The first stage is done in on_port_update_event
via port_change_observer.handle_port_update_event()
and the second stage is done after returning from port_change_observer.handle_port_update_event()
by iterating over port_dict
.
If we delete the port info from port_dict
in the first stage, the second stage cannot handle the port.
By introducing deleted_ports
, the original code was deleting the port info from port_dict
after finishing the second stage.
However, it looks like currently, the second stage is not doing anything for the removed port. so deleting the port info in the first stage is safe.
Though this works, I think the current two-stage design is bug-prone, because we might want to add some handling(e.g. de-initialize) for the removed port in the second stage.
If you agree, I'd like to refactor the current design in a different PR.
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
self.handle_cmis_state_machine(lport, info, is_fast_reboot) | ||
|
||
for event in self.deleted_ports.values(): | ||
self.port_dict.pop(event.port_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make the flow/logic simpler by just directly deleting self.port_dict[lport]
entry at the PORT_DEL event of CONFIG_DB in on_port_update_event()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I explained the background of this change here
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
self.port_dict[lport]['appl'] = 0 | ||
self.port_dict[lport]['host_lanes_mask'] = 0 | ||
continue | ||
def handle_cmis_state_machine(self, lport, info, is_fast_reboot): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here seems to be refactoring today's cmis_mgr's task_worker
function by splitting it into sub functions. I'm assuming the flow/logic of cmis_mgr's task_worker doesn't get changed, it's mostly just splitting functions. But this makes the number of LOC (Line of change) very big. I feel It's a little bit difficult for other people to distinguish the actual change of DPB fix from the entire change set of this PR.
Thus, I feel it might be better to put this refactoring change set in a separate PR. It will also be less of a work for release mgr to revert in case of anything broken. Just my 2 cents, not something must-have.
Not sure how @mihirpat1 or Prince feel about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@longhuan-cisco Agreed that splitting the current changeset into 2 separate PRs will be more helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@longhuan-cisco @mihirpat1 I'm sorry for the late response. Thanks for the suggestion.
I'll move the refactoring commit to a separate PR and only include the DPB fix in this PR.
6c839d3
to
7c1b475
Compare
@longhuan-cisco @mihirpat1 I updated the PR based on @longhuan-cisco 's suggestion. Please review. I used force-push and removed the original commits. I backed up the original branch here. |
sonic-xcvrd/xcvrd/xcvrd.py
Outdated
self.update_port_transceiver_status_table_sw_cmis_state(lport, CMIS_STATE_REMOVED) | ||
self.port_dict.pop(lport) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a transceiver is removed (not doing port breakout), the port would be removed from port_dict. When the transceiver is reattached to the port, there would be no port-related information in port_dict. I think the port should only be removed from port_dict after receiving a delete CONFIG_DB|PORT event and update cmis state after receiving a delete STATE_DB|TRANSCEIVER_INFO event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment. You're right. I'll update the code as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chiourung I fixed the issue with this commit. 954e9cb
Could you take a look?
d186e7f
to
954e9cb
Compare
@prgeor Hi, just a gentle ping to check if I need to do something to get this PR merged. |
CmisManagerTask's `port_dict` and `port_mapping` must be updated according to the port add/remove events. Before this commit, `port_mapping` is only intialized when CmisManagerTask is initialized and not updated after that, which was causing KeyError exception when DBP is used. (sonic-net/sonic-buildimage#18893) This commit removes the `port_mapping` field from CmisManagerTask as `port_mapping` was used just for storing `asic_id` information and that can be simply done by `port_dict` instead. Also, this commit updates `port_dict` accoding to the port add/remove events to support DPB. Signed-off-by: Wataru Ishida <[email protected]>
Signed-off-by: Wataru Ishida <[email protected]>
Signed-off-by: Wataru Ishida <[email protected]>
it is not used. Signed-off-by: Wataru Ishida <[email protected]>
954e9cb
to
2992bdf
Compare
Description
This commit fixes DPB support with CMIS transceivers.
CmisManagerTask's
port_dict
must be updated according to the port add/remove events.This commit removes the
port_mapping
field from CmisManagerTask asport_mapping
was mostly used just for storingasic_id
informationand that can be simply done by
port_dict
instead.Added a helper method
get_asic_id()
method to CmisManagerTask forgetting
asic_id
fromlogical_port
.Motivation and Context
fixes sonic-net/sonic-buildimage#18893
How Has This Been Tested?
I tested on the VS environment.
Additional Information (Optional)