Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing a path could lead to leftover flows. #476

Open
Alopalao opened this issue Jun 20, 2024 · 7 comments
Open

Removing a path could lead to leftover flows. #476

Alopalao opened this issue Jun 20, 2024 · 7 comments
Assignees
Labels
2024.2 Kytos-ng 2024.2 bug Something isn't working epic_mef_eline_flows_error_handling future_release Planned for the next release priority_medium Medium priority

Comments

@Alopalao
Copy link

Alopalao commented Jun 20, 2024

While exploring from issue #45:

i. Allocate and deallocate many individual vlans on an interface

The issue in failover_path (current_path is similar):

  • At the start of a failover_path installation, every needed VLAN is removed from the interfaces.
  • The installation could fail in the middle of sending flows.
    E.g:
failover_path = {
"00:00:00:00:00:00:00:01:1",
"00:00:00:00:00:00:00:01:4", "00:00:00:00:00:00:00:03:3",    # Using VLAN 87, correctly installed
"00:00:00:00:00:00:00:03:2", "00:00:00:00:00:00:00:02:3",    # Using VLAN 87, failed installation
"00:00:00:00:00:00:00:02:1"
}
  • In this case the error was Service Unavailable which also is the error when trying to delete the flows from the paths.
  • All the VLANs used by the path are freed, code.
  • Notice that we have 4 switches which VLANs do not match their installed flows:
"00:00:00:00:00:00:00:01:4"     # Freed VLAN 87, Flow was not removed
"00:00:00:00:00:00:00:03:3"     # Freed VLAN 87, Flow was not removed
"00:00:00:00:00:00:00:03:2"     # Freed VLAN 87, Flow was not installed
"00:00:00:00:00:00:00:02:3",    # Freed VLAN 87, Flow was not installed
  • The next EVC will try its luck with the same VLAN 87 if it successfully installs its failover_path. It will install a flow in the same switch, port and VLAN.

  • TLDR, when it comes to paths, we create/delete flows by switch but add/delete VLANs by paths. Deleting VLANs by confirmed deleted switch flow could solve this issue 🤔.

Things to keep in mind:

  • This issue was discovered when sending 2245 requests to create EVCs in 7 threads(turned out I created 2000 threads, thus the Service Unavailable error. Still this error is real.). This issue can be replicated by modifying flow_manager to detect a flow (a second or later addition flow) that is from a first EVC so a leftover installed flow is untouched. Then allowing every flow from a second EVC
  • The test reproduced Service Unavailable errors from both flow_manager and pathfinder. But, only error from flow_manager is important for this issue.
  • This issue does not seem critical. In my tests it only produced duplicated flows that would trigger constant consistency checks about a missing flow in a certain switch (related to the fail to remove path).
  • This error can occur in remove_path_flows and remove_current_flows although it was much more likely in the first in remove_path_flows. In practice I have not noticed an issue in practice with remove_current_flows but the steps for dealing with errors is the same.

Possible solutions:

  • Closing issue Error handling when installing and deleting flows #495 if ensures deletion of leftover flows at some point.
  • Another solution would be to modify VLANs availability for confirmed flows installation and deletion only. Currently kytos modifies VLANs by path when these are not installed/deleted entirely.

Update

  • Duplicated flows is not longer a problem from this scenario because issue from of_core was solved.
@Alopalao
Copy link
Author

Alopalao commented Jun 21, 2024

Created a not ideal solution. I solves duplicated flows for this case only. Presents another issue with VLANs. VLANs are used before sending flows, with flow mods additions and deletions errors, these VLANs are not freed but they should be.

@viniarck viniarck added bug Something isn't working priority_low Low priority priority_medium Medium priority and removed priority_low Low priority labels Jul 1, 2024
@viniarck
Copy link
Member

viniarck commented Jul 1, 2024

@Alopalao good finding, let's keep this in the backlog, since the service unavailable was due to a very high rate of requests that we don't expect in prod atm. As mef_eline consistency check gets implemented/enhanced it can also aid this case, and this one is already in progress, so let's avoid any other adjacent implementation in the meantime here. In the future, we can reassess if we'll augment the error handling part here and consider other adjacent efforts too like replacing requests or not.

@viniarck
Copy link
Member

viniarck commented Jul 1, 2024

Now regarding the rest of the stress tests, I'll recommend that you try to use the methods without a request to conclude that task since you've already found out this one here, so basically calling the vlan allocation deallocation method from a NApp.

@viniarck
Copy link
Member

viniarck commented Sep 9, 2024

@Alopalao can this be closed or was there something remaining to still be mapped and addressed?

@Alopalao
Copy link
Author

Alopalao commented Sep 9, 2024

This issue can still be present. The solution for this issue is ensure flow installation without errors or allocate/deallocate VLANS by confirmed flow installation/deletion only.

@viniarck viniarck added the future_release Planned for the next release label Sep 9, 2024
@viniarck
Copy link
Member

viniarck commented Sep 9, 2024

This issue can still be present. The solution for this issue is ensure flow installation without errors or allocate/deallocate VLANS by confirmed flow installation/deletion only.

Right. When you have the chance, let's clarify and map the remaining potential errors that you mean here. Are they the ones related to this issue here #495? If there's anything else let's document in the issue body just so we get back to it later when gets prioritized

@Alopalao
Copy link
Author

Alopalao commented Sep 9, 2024

Added Possible solutions so it is documented when to close this issue.

@viniarck viniarck added the 2024.2 Kytos-ng 2024.2 label Sep 16, 2024
@Alopalao Alopalao self-assigned this Sep 17, 2024
@Alopalao Alopalao changed the title Removing a path could lead to duplicated flows. Removing a path could lead to leftover flows. Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024.2 Kytos-ng 2024.2 bug Something isn't working epic_mef_eline_flows_error_handling future_release Planned for the next release priority_medium Medium priority
Projects
None yet
Development

No branches or pull requests

2 participants