Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e_smi_tool crashed when SMT disabled #15

Open
HenryHuang2004 opened this issue Aug 27, 2024 · 5 comments
Open

e_smi_tool crashed when SMT disabled #15

HenryHuang2004 opened this issue Aug 27, 2024 · 5 comments

Comments

@HenryHuang2004
Copy link

HenryHuang2004 commented Aug 27, 2024

Hello!
I'm using RockyLinux 9.4 with kernel version 5.14.0-427.26.1.el9_4.x86_64, with AMD EPYC 9654 Processors.

When SMT enabled,everything works well, but for some reasons I have to make SMT disabled.

Then I've encountered a problem when trying to run e_smi_tool to query the CPU's current frequency and limit frequency. The issue appears to occur randomly, and an output is as follows:

-----------------------------------------------------------------------------------------------------------------
| CPU boostlimit in MHz:                                                                                     |
| cpu [  0] : 3700  3700  3700  3700  3700  3700  3700  3700  3700  3700  3700  3700  3700  3700  3700  3700    |
| cpu [ 16] : NA    NA    NA    NA    NA    NA    NA    NA    3700  3700  3700  3700  3700  3700  3700  3700    |
| cpu [ 32] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 48] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 64] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 80] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 96] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [112] : 3700  3700  3700  3700  3700  3700  3700  3700  NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [128] : NA    NA    NA    NA    NA    NA    NA    NA    3700  3700  3700  3700  3700  3700  3700  3700    |
| cpu [144] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [160] : 3700  3700  3700  3700  3700  3700  3700  3700  NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [176] : NA    NA    NA    NA    NA    NA    NA    NA    3700  3700  3700  3700  3700  3700  3700  3700    |
-----------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| CPU core clock current frequency limit in MHz:                                                             |
| cpu [  0] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 16] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 32] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 48] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 64] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 80] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 96] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [112] : 3700  3700  3700  3700  3700  3700  3700  3700  NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [128] : NA    NA    NA    NA    NA    NA    NA    NA    3700  3700  3700  3700  3700  3700  3700  3700    |
| cpu [144] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [160] : 3700  3700  3700  3700  3700  3700  3700  3700  NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [176] : NA    NA    NA    NA    NA    NA    NA    NA    3700  3700  3700  3700  3700  3700  3700  3700    |
-----------------------------------------------------------------------------------------------------------------

Additionally, when I try to run e_smi_tool repeatedly, I get the following error message:

Error in initialising HSMP version specific info, Only energy data can be obtained...
Err[3]: HSMP driver not present

Even after reloading the amd_hsmp module, the error persists. Fortunately, querying energy data still works correctly.

Could you please look into this issue? Any help would be greatly appreciated.

Thank you!

@HenryHuang2004 HenryHuang2004 changed the title e_smi_tool crashed when e_smi_tool crashed when SMT disabled Aug 27, 2024
@sumachidanand
Copy link
Contributor

Hi,

We will look into this issue and get back to you.

@HenryHuang2004
Copy link
Author

HenryHuang2004 commented Aug 30, 2024

Additionally, when IOMMU is disabled, e_smi_tool can get the boost limit and current frequency of the cores only in socket 0.
When I try to query or set something in socket 1, it reminds me Err[18]: Input value is invalid. It seems that the function hsmp_xfer doesn't work.

Hope the information helps.

@sumachidanand
Copy link
Contributor

Hi Henry,

We are working on this issue. We will update you.

@sumachidanand
Copy link
Contributor

We are not able to reproduce the "Err[18]: Input value is invalid" error in e_smi.
We have disabled IOMMU.
When IOMMU was disabled, even the SMT was disabled in your setup?

@sumachidanand
Copy link
Contributor

Hi Henry,

Can you try with latest BIOS. The issue is no more seen with latest BIOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@HenryHuang2004 @sumachidanand and others