Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added lock for EVC deletion #480

Merged
merged 5 commits into from
Jul 22, 2024
Merged

Added lock for EVC deletion #480

merged 5 commits into from
Jul 22, 2024

Conversation

Alopalao
Copy link

@Alopalao Alopalao commented Jul 10, 2024

Closes #478

Summary

Duplicated EVCs could happen. How?:

So these are the LOG messages from an EVC which got deployed by consistency check:

(mef_eline) Got circuits for check number: 3
(AnyIO worker thread) EVC(4b3cd9f35d6243, evc1) was deployed.
(AnyIO worker thread) delete_circuit /v2/evc/4b3cd9f35d6243 last CONSIS: 3          # EVC removed from memory (.pop())
(AnyIO worker thread) Removing EVC(4b3cd9f35d6243, evc1)
(mef_eline) Got circuits for check number: 4                                        # Copy EVCs from memory and DB 
(thread_pool_app_0) Failover path for EVC(4b3cd9f35d6243, evc1) was deployed.
(AnyIO worker thread) EVC removed. EVC(4b3cd9f35d6243, evc1)                        # Confirmation EVC deleted removed DB
127.0.0.1:55234 - "DELETE /api/kytos/mef_eline/v2/evc/4b3cd9f35d6243/ HTTP/1.1" 200
(mef_eline) EVC found in mongodb but unloaded 4b3cd9f35d6243 from check N: 4        # EVC restored
(mef_eline) Failover path for EVC(4b3cd9f35d6243, evc1) was deployed.

For an EVC deletion, first we pop the EVC from memory and then prepare EVC data to for its sync (DB deletion). While EVC was being prepared to be deleted, consistency check copied EVCs from memory (without popped EVC) and DB data (with EVC yet to be deleted). This caused it to be restored.

Local Tests

Run scripts again and no duplicated EVCs were found.

End-to-End Tests

tests/test_e2e_01_kytos_startup.py ..                                    [  0%]
tests/test_e2e_05_topology.py ..................                         [  7%]
tests/test_e2e_06_topology.py ....                                       [  9%]
tests/test_e2e_10_mef_eline.py ..........ss.....x.....x................  [ 24%]
tests/test_e2e_11_mef_eline.py ......                                    [ 26%]
tests/test_e2e_12_mef_eline.py .....Xx.                                  [ 29%]
tests/test_e2e_13_mef_eline.py ....Xs.s.....Xs.s.XXxX.xxxx..X........... [ 44%]
.                                                                        [ 45%]
tests/test_e2e_14_mef_eline.py x                                         [ 45%]
tests/test_e2e_15_mef_eline.py .....                                     [ 47%]
tests/test_e2e_16_mef_eline.py .                                         [ 47%]
tests/test_e2e_20_flow_manager.py ......................                 [ 56%]
tests/test_e2e_21_flow_manager.py ...                                    [ 57%]
tests/test_e2e_22_flow_manager.py ...............                        [ 62%]
tests/test_e2e_23_flow_manager.py ..............                         [ 68%]
tests/test_e2e_30_of_lldp.py .R...                                       [ 69%]
tests/test_e2e_31_of_lldp.py ...                                         [ 70%]
tests/test_e2e_32_of_lldp.py ...                                         [ 71%]
tests/test_e2e_40_sdntrace.py ................                           [ 77%]
tests/test_e2e_41_kytos_auth.py ........                                 [ 80%]
tests/test_e2e_42_sdntrace.py ..                                         [ 81%]
tests/test_e2e_50_maintenance.py ............................            [ 92%]
tests/test_e2e_60_of_multi_table.py .....                                [ 93%]
tests/test_e2e_70_kytos_stats.py .....RR...                              [ 96%]
tests/test_e2e_80_pathfinder.py ss......                                 [100%]

@Alopalao Alopalao requested a review from a team as a code owner July 10, 2024 18:14
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alopalao, nicely done finding and fixing this issue.

Overall the fix/approach is OK, but notice there's a thread safety issue that can still happen, I explained below, check it out, we'll need to cover that too.

main.py Outdated Show resolved Hide resolved
@Alopalao Alopalao changed the title Moved pop() circuit to after sync() Added lock for EVC deletion Jul 15, 2024
@viniarck viniarck self-requested a review July 16, 2024 14:10
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alopalao, acquiring self._lock (global consistency lock) can solve, but notice that this serializes entirely EVCs deletion which is an IO bound operation, so any concurrent deletions will have their performance significantly decreased. So, this isn't desirable to have in this case. EVC deletion should be safe per id, anything else that's not safe it's a problem that we need to solve. Let's go by parts and brainstorm with you:

1 - The self._load_evc(dict) on consistency check, has never been thread safe per EVC (we could make it too but it's not worth it I'll elaborate on it) as you've found with this bug here. The easiest fix would be to just remove that last for loop that's trying to load evcs on consistency. EVCs should be only loaded during setup(), this used to be here to aid in certain EVCs activations when EVCs didn't use to persist the active state. I think we can safely remove this (in fact we could even remove more parts of the consistency as that will be refactored, but let's only remove the minimum part), can you try to remove that last for loop and reassess again if it fixes?

2 - Since you need to reove self._lock from the EVC deletion, notice that there the problem of potential duplicated events might still come back, I'll recommend when popping to check if it was popped while acquiring the evc.lock and right after the lock if the EVC has been archived it means that another concurrent thread has already deleted it so you can early return with 200. See what I mean?

@viniarck viniarck self-requested a review July 19, 2024 13:09
setup.py Show resolved Hide resolved
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alopalao, great updates. It's almost there, just a minor detail remaining, check out my comment. Also, don't forget to update the changelog and dispatch a final e2e exec, once it's passing it'll get approved and merged.

setup.py Show resolved Hide resolved
main.py Show resolved Hide resolved
@viniarck viniarck self-requested a review July 22, 2024 14:42
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Leaving it pre-approved. Feel free to merge when e2e is passing.

@Alopalao Alopalao merged commit 495f75f into master Jul 22, 2024
2 checks passed
@Alopalao Alopalao deleted the fix/double_evc branch July 22, 2024 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

It is possible to have two activated EVCs with the same VLAN used
2 participants