Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow blacklisting sensors #25

Open
rudolf81 opened this issue Jan 24, 2024 · 13 comments
Open

Allow blacklisting sensors #25

rudolf81 opened this issue Jan 24, 2024 · 13 comments
Labels
enhancement New feature or request

Comments

@rudolf81
Copy link

rudolf81 commented Jan 24, 2024

I noticed my SSD temp (around 67 degs) gets picked up as the temp, and used for the rules, etc.

The SSD temp lives under /sys/class/hwmon/hwmon1/temp*_input.
temp1 is "nvme composite" temp, and "temp2 is nvme sensor 2".
Sensor 2 reads around 67 most of the time.

The SSD is not covered by the Thinkpad cooler (on the T16 at least):
https://laptopmedia.com/wp-content/uploads/2022/08/internals-1000x711.jpg

So, controlling fan temp based on this max temp, is not very useful.
TEMP_FILES_GLOB "/sys/class/hwmon/hwmon*/temp*_input" is too broad.

I think a solution could be to allow an override path for TEMP_FILES_GLOB to be specified in /etc/zcfan.conf
On my system, /sys/class/hwmon/hwmon6/ contains all the CPU and GPU temps.

(I dunno if having a single path override is going to be feasible for some hardware configuration - with dedicated graphics chips, which might be reported under a different /sys/class/hwmon/hwmon*/...)

@cdown
Copy link
Owner

cdown commented Jan 26, 2024

This is by design, on laptops the entire laptop is affected by airflow, and we only have one fan knob. It's not only supposed to be CPU/GPU specific. 67C is very close to recommended maximum operating temperature for basically any NVMe drive, so it's normal to pump a bunch of air in there.

It sounds like zcfan is doing the right thing here, that's way too close to max operating temp.

@rudolf81
Copy link
Author

rudolf81 commented Jan 26, 2024

Thanks for the reply.
/sys/class/hwmon/hwmon1/name is def "nvme"
and the temps:
/sys/class/hwmon/hwmon1/temp*_input:
show 31850 and 67850... I think all the time.

They never change.

I've got the T16 in idle mode... nothing going on and no load on storage.
The zcfan default for low_temp is 70 so, if my nvme is sitting at 67, then the fan won't even turn on...

Yes, I see you are right... close to 70 is the danger zone for nvme SSDs...

Ok, lets test:
I edited zcfan.conf to set max_temp to 66, to force the fan to run at max.
It's been going full tilt for a few mins now, with no load, and /sys/class/hwmon/hwmon1/temp1_input dropped from 31850 to 29850.
Interesting. I guess there is some airflow on the SSD, even through it is not covered by the heat pipes...

The other temp sensor on /sys/class/hwmon/hwmon1/temp3_input is STILL on 67850.
(There is no temp2 sensor).

Soooo... maybe the actual SSD temp is temp1_input, at around 30 degrees, and temp3_input is not working, or something else? It never changes.

@cdown
Copy link
Owner

cdown commented Feb 29, 2024

I'm willing to add a mechanism to disable faulty sensors, but it would need to be robust and not too cumbersome. One of the problems is hwmon ordering is not deterministic.

So probably what one would have is, instead, the ability to select which sensor(s) they want by name. Alternatively, we can just look for coretemp/k10temp and only use those sensors.

I don't know what the right answer is yet, it requires some thinking in terms of ergonomics and complexity.

@cdown cdown added the enhancement New feature or request label Feb 29, 2024
@cdown cdown changed the title zcfan pics up SSD temp in get_max_temp() instead of CPU/GPU temp Allow blacklisting sensors Feb 29, 2024
@rudolf81
Copy link
Author

Thanks again for the reply.

Yes, you are right. They might change numbers.
Blocking them based on name sounds like the right way to do it.
Probably just stating the sensors to skip is better than stating the single sensor to measure... so in case of multiple temp sensors for GPU, etc - it would work better.

Or - do make it work with select sensors only, but have them passed in as optional arguments at runtime (not from config), and then leave it to the user to write a script to hunt and filter for the needed sensor paths, and then pass them to the application at runtime.
(Hmm not sure how that would work if you still wanted to run it as a service)

@Pyntux
Copy link

Pyntux commented Apr 17, 2024

Same problem I have with wifi chipset temp...

@rudolf81
Copy link
Author

@Pyntux - I've cobbled together a crude hard-coded filter into a zcfan, to exclude the one I don't want. It works... but I usually end up with a hard crash on the system if I leave it for an extended period. I think if the system enters standby mode or something - its gone. No coming back.

Probably my own bad code somewhere. I didn't expect a bad array pointer or something in a user-space fan-driver could bring down the whole system...

I'm contemplating writing my own version of this in something like Python, just for fun, and then maybe later in Rust.

Though I'm pondering the use of /proc/acpi/ibm/fan - as it only allows for some 3 pre-set fan temps it seems...
Its not PWM or something.

The Thinkpad's own thermal management system, which I guess lives in the bios, has much smoother control of the fan (not bracketed speeds). However - it bombs out sometimes (for me at least), and then spins the fan up/down constantly with no hysteresis.

@Reutertu3
Copy link

I know it's against the philosophy of zero configuration, but I'd also like to chime in to support the notion of blacklisting sensors.

The only things truly getting hot in my laptop are the SSDs whenever I'm operating in VMs. That will of course still trigger the fans to run at the set threshold speeds.

@rudolf81
Copy link
Author

CC: @Reutertu3 , @Pyntux
You can give my Python implementation a try:
https://github.com/rudolf81/ibm-fan-con-py
You can block specific sensors.

@cdown
Copy link
Owner

cdown commented Mar 14, 2025

Since there's clearly enough demand and I have some time, 7183cc0 has a version which can do this.

You add:

ignore_sensor [name]

To the config file, so for example:

ignore_sensor iwlwifi_1
ignore_sensor BAT0

It seems to work well, but I'd appreciate if affected users can confirm it works nicely (especially since the sensor retrieval logic is changed quite a bit).

@Reutertu3
Copy link

Hi!
So sorry for the late reply. It kinda fell under my radar.

I tried the commit, I'm apparently just struggling to point the config towards the right sensor. I tried plugging the output for ls /sys/class/hwmon/hwmon*/device | grep nvme -> (nvme0n1)
and sensors ->(nvme-pci-0400) or just nvme into the config, but zcfan reports this upon start:

[CFG] Ignored 0 present sensors based on config

Which suggests I'm just failing at identifying the sensor name.

@cdown
Copy link
Owner

cdown commented Mar 20, 2025

It's the name from "name" in the hwmon dir. So one of these:

cat /sys/class/hwmon/*/name

@OE1FEU-DF5JT
Copy link

How do we know which of the sensors is read by zcfan, because it's all different across thinkpad models?

My X1 gen.11 has one distinct hot spot on the upper left hand side, right by the two USB ports and the ESC-key. It's not a good design decision by Lenovo, but anything up there is not critical and won't destroy the laptop, even if temperatures are high, and by high I mean that it's hot to the touch, but no unbearably so, it won't burn my fingers. And that's definitely the hottest part of the laptop.

cat /sys/class/hwmon/*/name AC acpitz BAT0 nvme ucsi_source_psy_USBC000:001 ucsi_source_psy_USBC000:002 thinkpad coretemp iwlwifi_1
I already got really high thresholds in /etc/zcfan.conf, yet, the fan is in an on/off/medium cycle that drives me nuts. Got this laptop less than a year ago and there used to be a time wen it was basically fanless, except for heavy stuff like video rendering.

There is obviously one sensor out there, constantly shows a value between 77° and >90° and that's the one that trigggers the fans. That sensor is not critical for the laptop in terms of damaging anything through too high temperatures, yet, this is the one that constantly triggers the fan, even with really high temperature thresholds.

I am tempted to put 100°C as the lowest trigger value for low_temp, but that in fact would defy the purpose of using zcfan at all.

I am a little at my loss.

@Reutertu3
Copy link

It's the name from "name" in the hwmon dir. So one of these:

cat /sys/class/hwmon/*/name

In that case I should have set it up correctly, right?

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants