Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.17] OCPBUGS-43664: Add vendor and architecture specific tuning options #1191

Open
wants to merge 3 commits into
base: release-4.17
Choose a base branch
from

Conversation

MarSik
Copy link
Contributor

@MarSik MarSik commented Oct 22, 2024

  • CNF-14090: Add vendor and architecture specific tuning options
  • Performance tuning support for 3 platforms (amd/x86,arm/aarch64,intel/x86) is added in this change
  • When a valid platform is detected the additional platform specific tuning will be imported alongside the default tuning
  • This makes use of a new helper function added to tuned to detect the system name and architecture
  • Update unit tests to account for the various changes
  • Add new unit tests to cover the platform specific tuning
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Use variable composition for idle_poll

  • idle=poll is only supported on x86
  • Update tests to account for changes
  • Add explaination comments to empty values in openshift-node-performance
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Fix active/passive pstates

  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • OCPBUGS-43665: Drop amd_iommu=on from amd tuning

OCPBUGS-43666: Fix kernel arguments ordering on Intel

An upgrade from previous version causes one extra reboot
due to differently ordered kernel arguments. This is
a side effect of platform specific tuned profile split
we merged in #1083

This fix updates the Intel specific tuned profile to
follow the same ordering that was used in the past.

It does so by exploiting a specific tuned behavior
of the bootloader plugin. It orders the kernel argument
cmdline_suffix keys based on the order of first appearance.
Any additional appearance just changes the value, but
not the ordering.

The change is only needed for Intel, because we have
never supported other platforms before and so upgrade
is not an issue.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 22, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 22, 2024

@MarSik: This pull request references CNF-14090 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.17." or "openshift-4.17.", but it targets "openshift-4.18" instead.

In response to this:

  • CNF-14090: Add vendor and architecture specific tuning options
  • Performance tuning support for 3 platforms (amd/x86,arm/aarch64,intel/x86) is added in this change
  • When a valid platform is detected the additional platform specific tuning will be imported alongside the default tuning
  • This makes use of a new helper function added to tuned to detect the system name and architecture
  • Update unit tests to account for the various changes
  • Add new unit tests to cover the platform specific tuning
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Use variable composition for idle_poll

  • idle=poll is only supported on x86
  • Update tests to account for changes
  • Add explaination comments to empty values in openshift-node-performance
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Fix active/passive pstates

  • CNF-14090: Re-sync e2e test yaml for tuning changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from jmencak and Tal-or October 22, 2024 09:19
Copy link
Contributor

openshift-ci bot commented Oct 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MarSik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2024
@MarSik
Copy link
Contributor Author

MarSik commented Oct 22, 2024

/jira cherry-pick CNF-14090

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 22, 2024

@MarSik: Ignoring requests to cherry-pick non-bug issues: CNF-14090

In response to this:

/jira cherry-pick CNF-14090

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@MarSik
Copy link
Contributor Author

MarSik commented Oct 22, 2024

/jira cherry-pick OCPBUGS-43660

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 22, 2024

@MarSik: Jira Issue OCPBUGS-43660 has been cloned as Jira Issue OCPBUGS-43664. Will retitle bug to link to clone.
/retitle OCPBUGS-43664: CNF-14090: Add vendor and architecture specific tuning options (#1083)

In response to this:

/jira cherry-pick OCPBUGS-43660

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot changed the title CNF-14090: Add vendor and architecture specific tuning options (#1083) OCPBUGS-43664: CNF-14090: Add vendor and architecture specific tuning options (#1083) Oct 22, 2024
@openshift-ci-robot
Copy link
Contributor

@MarSik: This pull request references Jira Issue OCPBUGS-43664, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected dependent Jira Issue OCPBUGS-43665 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is New instead
  • expected dependent Jira Issue OCPBUGS-43660 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

  • CNF-14090: Add vendor and architecture specific tuning options
  • Performance tuning support for 3 platforms (amd/x86,arm/aarch64,intel/x86) is added in this change
  • When a valid platform is detected the additional platform specific tuning will be imported alongside the default tuning
  • This makes use of a new helper function added to tuned to detect the system name and architecture
  • Update unit tests to account for the various changes
  • Add new unit tests to cover the platform specific tuning
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Use variable composition for idle_poll

  • idle=poll is only supported on x86
  • Update tests to account for changes
  • Add explaination comments to empty values in openshift-node-performance
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Fix active/passive pstates

  • CNF-14090: Re-sync e2e test yaml for tuning changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 22, 2024
@MarSik
Copy link
Contributor Author

MarSik commented Oct 22, 2024

/jira cherry-pick OCPBUGS-43665

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 22, 2024

@MarSik: Jira Issue OCPBUGS-43665 has been cloned as Jira Issue OCPBUGS-43666. Will retitle bug to link to clone.
/retitle OCPBUGS-43666: OCPBUGS-43664: CNF-14090: Add vendor and architecture specific tuning options (#1083)

In response to this:

/jira cherry-pick OCPBUGS-43665

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot changed the title OCPBUGS-43664: CNF-14090: Add vendor and architecture specific tuning options (#1083) OCPBUGS-43666: OCPBUGS-43664: CNF-14090: Add vendor and architecture specific tuning options (#1083) Oct 22, 2024
@openshift-ci-robot
Copy link
Contributor

@MarSik: This pull request references Jira Issue OCPBUGS-43666, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected dependent Jira Issue OCPBUGS-43665 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is New instead
  • expected dependent Jira Issue OCPBUGS-43665 to target a version in 4.18.0, but it targets "4.17.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

  • CNF-14090: Add vendor and architecture specific tuning options
  • Performance tuning support for 3 platforms (amd/x86,arm/aarch64,intel/x86) is added in this change
  • When a valid platform is detected the additional platform specific tuning will be imported alongside the default tuning
  • This makes use of a new helper function added to tuned to detect the system name and architecture
  • Update unit tests to account for the various changes
  • Add new unit tests to cover the platform specific tuning
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Use variable composition for idle_poll

  • idle=poll is only supported on x86
  • Update tests to account for changes
  • Add explaination comments to empty values in openshift-node-performance
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Fix active/passive pstates

  • CNF-14090: Re-sync e2e test yaml for tuning changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@MarSik
Copy link
Contributor Author

MarSik commented Oct 22, 2024

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@MarSik: This pull request references Jira Issue OCPBUGS-43666, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected dependent Jira Issue OCPBUGS-43665 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is New instead
  • expected dependent Jira Issue OCPBUGS-43665 to target a version in 4.18.0, but it targets "4.17.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@MarSik MarSik changed the title OCPBUGS-43666: OCPBUGS-43664: CNF-14090: Add vendor and architecture specific tuning options (#1083) OCPBUGS-43666: CNF-14090: Add vendor and architecture specific tuning options (#1083) Oct 22, 2024
@MarSik MarSik changed the title OCPBUGS-43666: CNF-14090: Add vendor and architecture specific tuning options (#1083) OCPBUGS-43664: CNF-14090: Add vendor and architecture specific tuning options (#1083) Oct 22, 2024
@openshift-ci-robot
Copy link
Contributor

@MarSik: This pull request references Jira Issue OCPBUGS-43664, which is invalid:

  • expected the bug to be open, but it isn't
  • expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is Closed (Obsolete) instead
  • expected dependent Jira Issue OCPBUGS-43660 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is MODIFIED instead
  • expected dependent Jira Issue OCPBUGS-43665 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is Closed (Obsolete) instead
  • expected dependent Jira Issue OCPBUGS-43665 to target a version in 4.18.0, but it targets "4.17.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

  • CNF-14090: Add vendor and architecture specific tuning options
  • Performance tuning support for 3 platforms (amd/x86,arm/aarch64,intel/x86) is added in this change
  • When a valid platform is detected the additional platform specific tuning will be imported alongside the default tuning
  • This makes use of a new helper function added to tuned to detect the system name and architecture
  • Update unit tests to account for the various changes
  • Add new unit tests to cover the platform specific tuning
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Use variable composition for idle_poll

  • idle=poll is only supported on x86
  • Update tests to account for changes
  • Add explaination comments to empty values in openshift-node-performance
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Fix active/passive pstates

  • CNF-14090: Re-sync e2e test yaml for tuning changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@MarSik MarSik changed the title OCPBUGS-43664: CNF-14090: Add vendor and architecture specific tuning options (#1083) OCPBUGS-43664: Add vendor and architecture specific tuning options Oct 22, 2024
@MarSik
Copy link
Contributor Author

MarSik commented Oct 22, 2024

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@MarSik: This pull request references Jira Issue OCPBUGS-43664, which is invalid:

  • expected the bug to be open, but it isn't
  • expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is Closed (Obsolete) instead
  • expected dependent Jira Issue OCPBUGS-43660 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is MODIFIED instead
  • expected dependent Jira Issue OCPBUGS-43665 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is Closed (Obsolete) instead
  • expected dependent Jira Issue OCPBUGS-43665 to target a version in 4.18.0, but it targets "4.17.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@MarSik
Copy link
Contributor Author

MarSik commented Oct 22, 2024

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@MarSik: This pull request references Jira Issue OCPBUGS-43664, which is invalid:

  • expected dependent Jira Issue OCPBUGS-43660 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.



cmdline_realtime_intel=tsc=reliable nmi_watchdog=0 mce=off

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why so many empty lines? Similarly elsewhere. Was this in the original PR too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The templating generates it like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there is no way to fix it? It is pretty ugly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same for the old profiles and this is the generated content. Yes, there is a way to fix it by using {{- xxx -}} properly in all the templates. But not in this PR as it is a wider issue.

@bartwensley
Copy link
Contributor

@MarSik - are you sure you want to cherry-pick this now? I think it would make sense to wait for at least some degree of QE to happen for this on OCP 4.18 before the cherry-pick - given that it changes kernel parameters and such, there is potential for significant breakage if something is wrong.

@MarSik
Copy link
Contributor Author

MarSik commented Oct 22, 2024

@bartwensley QE approval of 4.18 is needed to merge this indeed. But I want to be ready.

@MarSik
Copy link
Contributor Author

MarSik commented Oct 22, 2024

@bartwensley Btw, the idea is that this does not change any kernel arguments for Intel. Just for the other new platforms. And it is easier to verify with the PR posted (clusterbot can build a PR, yesterday it gave me an AMD VM though..)

@fontivan
Copy link
Contributor

/retest

@fontivan
Copy link
Contributor

4.17 still needs backports to fix CI failures, will need to be rebased once both of these PRs are merged

@fontivan
Copy link
Contributor

/retitle [release-4.17] OCPBUGS-43664: Add vendor and architecture specific tuning options

@openshift-ci openshift-ci bot changed the title OCPBUGS-43664: Add vendor and architecture specific tuning options [release-4.17] OCPBUGS-43664: Add vendor and architecture specific tuning options Oct 23, 2024
fontivan and others added 3 commits October 31, 2024 10:46
…hift#1083)

* CNF-14090: Add vendor and architecture specific tuning options
- Performance tuning support for 3 platforms (amd/x86,arm/aarch64,intel/x86) is added in this change
- When a valid platform is detected the additional platform specific tuning will be imported alongside the default tuning
- This makes use of a new helper function added to tuned to detect the system name and architecture
- Update unit tests to account for the various changes
- Add new unit tests to cover the platform specific tuning

* CNF-14090: Re-sync e2e test yaml for tuning changes

* CNF-14090: Use variable composition for idle_poll
- idle=poll is only supported on x86
- Update tests to account for changes
- Add explaination comments to empty values in openshift-node-performance

* CNF-14090: Re-sync e2e test yaml for tuning changes

* CNF-14090: Fix active/passive pstates

* CNF-14090: Re-sync e2e test yaml for tuning changes
* OCPBUGS-43665: Drop amd_iommu=on from amd tuning
- "=on" is not a valid value for amd_iommu
- amd_iommu is enabled by default unless you specify "amd_iommu=off", unlike intel
- See kernel docs for more information (https://docs.kernel.org/admin-guide/kernel-parameters.html)

* OCPBUGS-43665: Update render-sync for performance profile change
* Fix kernel arguments ordering on Intel

An upgrade from previous version causes one extra reboot
due to differently ordered kernel arguments. This is
a side effect of platform specific tuned profile split
we merged in openshift#1083

This fix updates the Intel specific tuned profile to
follow the same ordering that was used in the past.

It does so by exploiting a specific tuned behavior
of the bootloader plugin. It orders the kernel argument
cmdline_suffix keys based on the order of first appearance.
Any additional appearance just changes the value, but
not the ordering.

The change is only needed for Intel, because we have
never supported other platforms before and so upgrade
is not an issue.

* Sync rendered manifests
@MarSik
Copy link
Contributor Author

MarSik commented Oct 31, 2024

/retest-required

@MarSik
Copy link
Contributor Author

MarSik commented Oct 31, 2024

/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Oct 31, 2024
@MarSik
Copy link
Contributor Author

MarSik commented Oct 31, 2024

/retest-required

Copy link
Contributor

openshift-ci bot commented Oct 31, 2024

@MarSik: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

@MarSik: This pull request references Jira Issue OCPBUGS-43664, which is invalid:

  • expected dependent Jira Issue OCPBUGS-43660 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is ON_QA instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

  • CNF-14090: Add vendor and architecture specific tuning options
  • Performance tuning support for 3 platforms (amd/x86,arm/aarch64,intel/x86) is added in this change
  • When a valid platform is detected the additional platform specific tuning will be imported alongside the default tuning
  • This makes use of a new helper function added to tuned to detect the system name and architecture
  • Update unit tests to account for the various changes
  • Add new unit tests to cover the platform specific tuning
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Use variable composition for idle_poll

  • idle=poll is only supported on x86
  • Update tests to account for changes
  • Add explaination comments to empty values in openshift-node-performance
  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • CNF-14090: Fix active/passive pstates

  • CNF-14090: Re-sync e2e test yaml for tuning changes

  • OCPBUGS-43665: Drop amd_iommu=on from amd tuning

OCPBUGS-43666: Fix kernel arguments ordering on Intel

An upgrade from previous version causes one extra reboot
due to differently ordered kernel arguments. This is
a side effect of platform specific tuned profile split
we merged in #1083

This fix updates the Intel specific tuned profile to
follow the same ordering that was used in the past.

It does so by exploiting a specific tuned behavior
of the bootloader plugin. It orders the kernel argument
cmdline_suffix keys based on the order of first appearance.
Any additional appearance just changes the value, but
not the ordering.

The change is only needed for Intel, because we have
never supported other platforms before and so upgrade
is not an issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants