-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFC003 fan off test case #635
base: develop
Are you sure you want to change the base?
Conversation
f9378d9
to
219a99d
Compare
bc433f9
to
024d98f
Compare
platform-configs/include/sensors/protectli-vpxxxx-fan-curve-config.yaml
Outdated
Show resolved
Hide resolved
platform-configs/include/sensors/protectli-vpxxxx-fan-curve-config.yaml
Outdated
Show resolved
Hide resolved
767f887
to
805bbff
Compare
Signed-off-by: Filip Gołaś <[email protected]>
e440dc1
to
ca0a718
Compare
8ea3cc9
to
2617c88
Compare
@philipandag we have all discussion resolved now. I would say it is ready to go in. But I would like to see the logs fro the tests run from the latest commit from this PR. |
custom-fan-curve.robot_log.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the test failed because the fan speed was decreasing slower than the temperature. Temperature can get very low quickly on the sensor, but the fan often needs couple of seconds to slow down.
The test may be failing all the time because of that. I wonder if it is really necessary to measure the fall time of fan speed based o temperature. If anything we may only check if the fan slows down to the expected temperature after 30 seconds for example. Right now I see the stress tests lasts 3m, and measurements are made with 0.05 minute intervals (3 seconds, 60 iterations). That is more than the stress test length, because there are delays caused by execution of test instructions.
lib/sensors.robot
Outdated
|
||
IF ${lm_sensors_used} == ${TRUE} | ||
Detect Or Install Package lm-sensors | ||
Execute Command In Terminal sudo sensors-detect --auto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detecting sensors must happen after modules are loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to have been working fine though? At least in this one case it didn't matter much.
https://github.com/Dasharo/open-source-firmware-validation/compare/43a0eac2910e734464997f08d5335986f8499f24..0ab2b98ce42a75edd4ab18f839b33c6f3d9ae213
Signed-off-by: Filip Gołaś <[email protected]> lib/sensors.robot: Aesthetical refactor Signed-off-by: Filip Gołaś <[email protected]> Make use of lib/sensors platform config vars Signed-off-by: Filip Gołaś <[email protected]> lib/sensors/robot: Use triple quotes for string comparisons Signed-off-by: Filip Gołaś <[email protected]> lib/sensors: import sensor config variables Signed-off-by: Filip Gołaś <[email protected]> lib/sensors.robot: Use hwmon path from sensors config file directly Signed-off-by: Filip Gołaś <[email protected]>
Signed-off-by: Filip Gołaś <[email protected]> cpu-fan-speed-measure.robot: Use lib/sensors Signed-off-by: Filip Gołaś <[email protected]> custom-fan-curve.robot: Use lib/sensors Signed-off-by: Filip Gołaś <[email protected]>
Signed-off-by: Filip Gołaś <[email protected]> platform-configs: Move sensor configs to yaml files Signed-off-by: Filip Gołaś <[email protected]> lib/sensors: Use new yaml sensor configs Signed-off-by: Filip Gołaś <[email protected]> sensors configs: Remove unnecessary lists Signed-off-by: Filip Gołaś <[email protected]> platform-configs/sensors curves: Add tolerances Signed-off-by: Filip Gołaś <[email protected]> platform-configs/include/sensors: Set default RPMS to always valid Signed-off-by: Filip Gołaś <[email protected]> configs sensors rename vp66xx -> vpxxxx Signed-off-by: Filip Gołaś <[email protected]> vpxxxx-fan-curve-config: More accurate RPM curves Signed-off-by: Filip Gołaś <[email protected]>
To work using PWM or RPM depending on measurement of which is available Signed-off-by: Filip Gołaś <[email protected]> custom-fan-curve.robot: Use tolerances & continue if exceeded Temperatures out of tolerances might still be useful or even acceptable as a pass with manual verification Signed-off-by: Filip Gołaś <[email protected]> custom-fan-curve.robot: Move common test loop to local KW Signed-off-by: Filip Gołaś <[email protected]> custom-fan-curve.robot: More logs Signed-off-by: Filip Gołaś <[email protected]> custom-fan-curve.robot: Fail only on multiple invalid values in row Signed-off-by: Filip Gołaś <[email protected]>
Signed-off-by: Filip Gołaś <[email protected]>
Signed-off-by: Filip Gołaś <[email protected]>
Signed-off-by: Filip Gołaś <[email protected]>
Signed-off-by: Filip Gołaś <[email protected]>
Co-authored-by: Michał Żygowski <[email protected]>
Signed-off-by: Filip Gołaś <[email protected]>
Weren't tested on any other device Signed-off-by: Filip Gołaś <[email protected]>
The results can still be verified manually and used to adjust the curves Signed-off-by: Filip Gołaś <[email protected]>
Signed-off-by: Filip Gołaś <[email protected]>
43a0eac
to
0ab2b98
Compare
At first I thought it is caused by the fan just being a physical object with a mass, but then the RPM would lag by a couple seconds at most. The tests show that the fan stays at ~1600 RPM for 15 seconds. It is definitely something caused by the software controlling it, not the fan itself.
If the code controlling the fans introduces some additional lag when slowing down, then wouldn't that just mean that the fan curves are wrong? That the fan behavior is more complex than we assume it is in the test suite? |
Signed-off-by: Filip Gołaś <[email protected]>
@@ -33,7 +34,8 @@ CFN001.001 CPU temperature and fan speed can be read (Debian) | |||
Boot System Or From Connected Disk ubuntu | |||
Login To Linux | |||
Switch To Root User | |||
${rpm} ${temperature}= Get CPU Temperature And CPU Fan Speed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miczyg1
We have decided that the slowing down of the fans should not be included in the test.
In that case I would make the stress-ng timeout very long and kill the process once the measurements are done.
It will be more reliable than guessing how by much executing the test instructions will affect the time span across which the measurements are performed.
It seemed to work for a short 1m test. I'll rerun the whole suite to make sure everything works.
https://github.com/Dasharo/open-source-firmware-valida
custom-fan-curve.robot_log.zip
https://github.com/Dasharo/open-source-firmware-validation/compare/91df194%5E..91df194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not measuring the fan when it slows down would make the tests useless. It will only measure how the fans behave when the CPU is at it's highest temperature, while the point is to cover the whole temperature range.
Too bad I believed that the tests were making sense before, because they were not.
The duration and measurement interval are given in minutes, while even one minute is very long for such test.
The tests are waiting for 5 seconds after starting the stress, which effectively skips the temperature rising.
Originally the test duration was around 30 minutes, both the fan speed and CPU temperature must have been at their peaks for the whole time.
I don't know what was the original intent behind those tests, but trying to add a new test case was a bad idea, as they definitely had a completely different purpose than what we would like to achieve here.
To make the PASSes actually usefull we should try to either increase the temperature of the CPU slowly, for example by increasing the number of workers as time passes, or make the measurements as quickly and often and possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the fans are in fact not slowing down below ~1600 RPM when on performance curve.
I've left it be for a couple minutes and the RPMs don't fall
root@3mdeb:/home/ubuntu# sensors | grep -E "fan|C"
Adapter: PCI adapter
Adapter: PCI adapter
Composite: +27.9°C (low = -273.1°C, high = +81.8°C)
(crit = +84.8°C)
Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +28.9°C (low = -273.1°C, high = +65261.8°C)
fan1: 1679 RPM (min = 28 RPM)
fan2: 1704 RPM (min = 10 RPM)
temp1: +29.0°C (low = +0.0°C, high = +127.0°C) sensor = Intel PECI
temp2: -128.0°C (low = +89.0°C, high = -126.0°C)
temp3: -128.0°C (low = -47.0°C, high = +30.0°C)
Package id 0: +26.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +21.0°C (high = +100.0°C, crit = +100.0°C)
Core 4: +24.0°C (high = +100.0°C, crit = +100.0°C)
Core 8: +22.0°C (high = +100.0°C, crit = +100.0°C)
Core 9: +22.0°C (high = +100.0°C, crit = +100.0°C)
Core 10: +22.0°C (high = +100.0°C, crit = +100.0°C)
Core 11: +22.0°C (high = +100.0°C, crit = +100.0°C)
Core 12: +23.0°C (high = +100.0°C, crit = +100.0°C)
Core 13: +23.0°C (high = +100.0°C, crit = +100.0°C)
Core 14: +23.0°C (high = +100.0°C, crit = +100.0°C)
Core 15: +23.0°C (high = +100.0°C, crit = +100.0°C)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time it takes for the CPU to jump from ~25C to ~80C is less than 0.1s. We can not test this feature without some kind of gradation of the load.
Unless we want to test the curve when the cpu cools down. It should be fine provided the fans will actually slow down on the performance curve. There is either a bug in the code controlling the fans or we cannot physically achieve the CPU temperatures required to test the performance curve.
If no sensor is able to read the actual CPU temperature used to control the fans, then where the code controlling the fans gets the temperature from?
No description provided.