Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ip/test_mgmt_ipv6_only]: Refactor to properly cleanup on failure #16442

Merged
merged 2 commits into from
Feb 6, 2025

Conversation

liamkearney-msft
Copy link
Contributor

@liamkearney-msft liamkearney-msft commented Jan 10, 2025

Description of PR

ipv6 mgmt only tests can fail in setup, after config has been modified, leading to the teardown to not be run, and the duts to be left in an unreachable state. Refactor the test to split backing up and restoring the config into an independent fixture, such that failures in setting the mgmt IPs won't nuke the testbed. Also refactor / cleanup some other aspects of the tests to enhance clarity.

Summary:
Fixes #16441

Type of change

  • [x ] Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
    • Add ownership here(Microsft required only)
  • [x ] Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

Fix failures in test setup leaving the testbed in a wedged state, due to removed ipv4 mgmt IPs

How did you do it?

Refactor the test to have config backup and restore in a separate fixture

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Sorry, something went wrong.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rlhui rlhui added the Test gap label Jan 10, 2025
ipv6 mgmt only tests can fail in setup, after config has been modified,
leading to the teardown to not be run. Refactor the test to split
backing up and restoring the config into another fixture, such that
failures in setting the mgmt IPs wont nuke the testbed.
Also refactor some other aspects to remove linter bypasses

Signed-off-by: Liam Kearney <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Liam Kearney <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -55,10 +53,8 @@ def check_ntp_status(host):
return True


def run_ntp(duthosts, rand_one_dut_hostname, setup_ntp):
def run_ntp(duthost):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm we're removing setup_ntp here because the fixtures are/should be called locally and not used in this function?

Copy link
Contributor Author

@liamkearney-msft liamkearney-msft Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, in this case its redundant - the fixture is not called / used here as this is not a test function. The fixtures are only invoked by the test_xxx functions / or other fixtures

Copy link
Contributor

@Javier-Tan Javier-Tan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liamkearney-msft
Copy link
Contributor Author

@arlakshm @yejianquan pls review when you get the chance, thanks

@liamkearney-msft
Copy link
Contributor Author

@arlakshm @yejianquan bump again, thanks

@yejianquan
Copy link
Collaborator

@arlakshm @yejianquan bump again, thanks

Hi @liamkearney-msft overall it looks good to me, and thanks for simplify the more and more complicated logic.
for #16441 and 202405.msft, can we simply increase waiting time for voq chassis?
The reason is we're very closely to 100% passing rate, and plan to reach 100% this weekend.
This kind of refactoring at this moment is a little bit dangerous to our plan at this moment.

We can do it via 2 ways:

  1. Merge this PR and only to master/202411, for 202405.msft, we increase waiting time.
  2. Run the test with refactored test module on both voq and Cisco chassis, then merge to master/202411/202405.msft.

I personally prefer way 1, but also OK with way 2 with full qualified test result, @arlakshm what do you think?

@liamkearney-msft
Copy link
Contributor Author

liamkearney-msft commented Feb 5, 2025

hey @yejianquan, I believe that option 1 is what we have currently been doing - we have increased the timeout for our branches in 202405 to avoid this. This was more of the follow up to fix the root cause that would nuke the testbeds.
I get the hesitation with merging so close to finalised qual, im ok with not merging this to 202405 if you consider it a risky change (but ill let Arvind weigh in). With that being said though, we have run this on voq chassis and it passes.

Copy link
Collaborator

@yejianquan yejianquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yejianquan
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yejianquan
Copy link
Collaborator

hey @yejianquan, I believe that option 1 is what we have currently been doing - we have increased the timeout for our branches in 202405 to avoid this. This was more of the follow up to fix the root cause that would nuke the testbeds. I get the hesitation with merging so close to finalised qual, im ok with not merging this to 202405 if you consider it a risky change (but ill let Arvind weigh in). With that being said though, we have run this on voq chassis and it passes.

anyway, let's merge it into master first

@yejianquan yejianquan merged commit bbef584 into sonic-net:master Feb 6, 2025
28 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Feb 6, 2025
…nic-net#16442)

Description of PR
ipv6 mgmt only tests can fail in setup, after config has been modified, leading to the teardown to not be run, and the duts to be left in an unreachable state. Refactor the test to split backing up and restoring the config into an independent fixture, such that failures in setting the mgmt IPs won't nuke the testbed. Also refactor / cleanup some other aspects of the tests to enhance clarity.

Summary:
Fixes sonic-net#16441

Approach
What is the motivation for this PR?
Fix failures in test setup leaving the testbed in a wedged state, due to removed ipv4 mgmt IPs

How did you do it?
Refactor the test to have config backup and restore in a separate fixture

Signed-off-by: Liam Kearney <[email protected]>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #16812

mssonicbld pushed a commit that referenced this pull request Feb 8, 2025
…6442)

Description of PR
ipv6 mgmt only tests can fail in setup, after config has been modified, leading to the teardown to not be run, and the duts to be left in an unreachable state. Refactor the test to split backing up and restoring the config into an independent fixture, such that failures in setting the mgmt IPs won't nuke the testbed. Also refactor / cleanup some other aspects of the tests to enhance clarity.

Summary:
Fixes #16441

Approach
What is the motivation for this PR?
Fix failures in test setup leaving the testbed in a wedged state, due to removed ipv4 mgmt IPs

How did you do it?
Refactor the test to have config backup and restore in a separate fixture

Signed-off-by: Liam Kearney <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Failures in IPv6 mgmt test setup lead to lost mgmt IPs
5 participants