-
Notifications
You must be signed in to change notification settings - Fork 38
/
Copy pathle9i.mypatch
285 lines (271 loc) · 11.7 KB
/
le9i.mypatch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
le9i.patch
revision 4 from le9h.patch: removed CONFIG_IKCONFIG_PROC because I remembered that IKCONFIG_PROC was initially used by le9e.patch as a personal preference(so I'd know whether CONFIG_LE9D_PATCH was part of the kernel ie. was patch applied to the running kernel?), but for everyone else it's not needed because the presence of the sysctl values gives away that the patch was applied or not!
le9i.patch
trying to keep inactive ones above threshold too, to see if no freezes with memtest3
these freezes seem to be caused by writing dmesg/journald, but in fact are possibly caused by both!
a better patch would be keeping executable data unevictable - so todo: see memlockd
le9h.patch
revision 2: fixed 1 compilation warning
warning: there's a regression that is fixed by reverting commit aa56a292ce623734ddd30f52d73f527d1f3529b5 which is required to use this patch successfully(even tho this regression happens without this patch too!) see https://bugzilla.kernel.org/show_bug.cgi?id=203317#c4
this is licensed under all/any of:
Apache License, Version 2.0
MIT license
0BSD
CC0
UNLICENSE
Interesting, a phoronix user here https://www.phoronix.com/forums/forum/phoronix/general-discussion/1118164-yes-linux-does-bad-in-low-ram-memory-pressure-situations-on-the-desktop/page17#post1120291 linked to two older patches(that I was completely unaware of) which are supposedly doing this same thing (9yrs ago for the first, Oct 23, 2018 or earlier for the second):
https://lore.kernel.org/patchwork/patch/222042/
https://gitlab.freedesktop.org/seanpaul/dpu-staging/commit/0b992f2dbb044896c3584e10bd5b97cf41e2ec6d
they seem way more complicated though, at first glance.
but hey, they were at least aware of the issue and how to fix it!
see also: https://gitlab.freedesktop.org/hadess/low-memory-monitor/
earlyoom
etc.
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009ca..d30f0a967375 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -68,6 +68,8 @@ Currently, these files are in /proc/sys/vm:
- numa_stat
- swappiness
- unprivileged_userfaultfd
+- unevictable_activefile_kbytes
+- unevictable_inactivefile_kbytes
- user_reserve_kbytes
- vfs_cache_pressure
- watermark_boost_factor
@@ -848,6 +850,75 @@ privileged users (with SYS_CAP_PTRACE capability).
The default value is 1.
+unevictable_activefile_kbytes
+=============================
+
+How many kilobytes of `Active(file)` to never evict during high-pressure
+low-memory situations. ie. never evict active file pages if under this value.
+This will help prevent disk thrashing caused by Active(file) being close to zero
+in such situations, especially when no swap is used.
+
+As 'nivedita' (phoronix user) put it:
+"Executables and shared libraries are paged into memory, and can be paged out
+even with no swap. [...] The kernel is dumping those pages and [...] immediately
+reading them back in when execution continues."
+^ and that's what's causing the disk thrashing during memory pressure.
+
+unevictable_activefile_kbytes=X will prevent X kbytes of those most used pages
+from being evicted.
+
+The default value is 65536. That's 64 MiB.
+
+Set it to 0 to keep the default behaviour, as if this option was never
+implemented, so you can see the disk thrashing as usual.
+
+To get an idea what value to use here for your workload(eg. xfce4 with idle
+terminals) to not disk thrash at all, run this::
+
+ $ echo 1 | sudo tee /proc/sys/vm/drop_caches; grep -F 'Active(file)' /proc/meminfo
+ 1
+ Active(file): 203444 kB
+
+so, using vm.unevictable_activefile_kbytes=203444 would be a good idea here.
+(you can even add a `sleep` before the grep to get a slightly increased value,
+which might be useful if something is compiling in the background and you want
+to account for that too)
+
+But you can probably go with the default value of just 65536 (aka 64 MiB)
+as this will eliminate most disk thrashing anyway, unless you're not using
+an SSD, in which case it might still be noticeable (I'm guessing?).
+
+Note that `echo 1 | sudo tee /proc/sys/vm/drop_caches` can still cause
+Active(file) to go a under the vm.unevictable_activefile_kbytes value.
+It's not an issue and this is how you know how much the value for
+vm.unevictable_activefile_kbytes should be, at the time/workload when you ran it.
+
+The value of `Active(file)` can be gotten in two ways::
+
+ $ grep -F 'Active(file)' /proc/meminfo
+ Active(file): 2712004 kB
+
+and::
+
+ $ grep nr_active_file /proc/vmstat
+ nr_active_file 678001
+
+and multiply that with MAX_NR_ZONES (which is 4), ie. `nr_active_file * MAX_NR_ZONES`
+so 678001*4=2712004 kB
+
+MAX_NR_ZONES is 4 as per:
+`include/generated/bounds.h:10:#define MAX_NR_ZONES 4 /* __MAX_NR_ZONES */`
+and is unlikely the change in the future.
+
+The hub of disk thrashing tests/explanations is here:
+https://gist.github.com/constantoverride/84eba764f487049ed642eb2111a20830
+
+unevictable_inactivefile_kbytes
+===============================
+
+same thing as vm.unevictable_activefile_kbytes but for `Inactive(file):`
+
+
user_reserve_kbytes
===================
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 078950d9605b..57a379b89d4c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -110,6 +110,20 @@ extern int core_uses_pid;
extern char core_pattern[];
extern unsigned int core_pipe_limit;
#endif
+#if defined(CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING)
+#if CONFIG_RESERVE_ACTIVEFILE_KBYTES >= 0
+unsigned long sysctl_unevictable_activefile_kbytes __read_mostly = CONFIG_RESERVE_ACTIVEFILE_KBYTES;
+#else
+#error "CONFIG_RESERVE_ACTIVEFILE_KBYTES should be >= 0"
+#endif
+#endif
+#if defined(CONFIG_RESERVE_INACTIVEFILE_TO_PREVENT_DISK_THRASHING)
+#if CONFIG_RESERVE_INACTIVEFILE_KBYTES >= 0
+unsigned long sysctl_unevictable_inactivefile_kbytes __read_mostly = CONFIG_RESERVE_INACTIVEFILE_KBYTES;
+#else
+#error "CONFIG_RESERVE_INACTIVEFILE_KBYTES should be >= 0"
+#endif
+#endif
extern int pid_max;
extern int pid_max_min, pid_max_max;
extern int percpu_pagelist_fraction;
@@ -1693,6 +1707,24 @@ static struct ctl_table vm_table[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
+#endif
+#if defined(CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING)
+ {
+ .procname = "unevictable_activefile_kbytes",
+ .data = &sysctl_unevictable_activefile_kbytes,
+ .maxlen = sizeof(sysctl_unevictable_activefile_kbytes),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+#endif
+#if defined(CONFIG_RESERVE_INACTIVEFILE_TO_PREVENT_DISK_THRASHING)
+ {
+ .procname = "unevictable_inactivefile_kbytes",
+ .data = &sysctl_unevictable_inactivefile_kbytes,
+ .maxlen = sizeof(sysctl_unevictable_inactivefile_kbytes),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
#endif
{
.procname = "user_reserve_kbytes",
diff --git a/mm/Kconfig b/mm/Kconfig
index 56cec636a1fc..6d5c34b11990 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -63,6 +63,74 @@ config SPARSEMEM_MANUAL
endchoice
+config RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING
+ bool "Reserve some `Active(file)` to prevent disk thrashing"
+ depends on SYSCTL
+ def_bool y
+ help
+ Keep `Active(file)`(/proc/meminfo) pages in RAM so as to avoid system freeze
+ due to the disk thrashing(disk reading only) that occurrs because the running
+ executables's code is being evicted during low-mem conditions which is
+ why it also prevents oom-killer from triggering until 10s of minutes later
+ on some systems.
+
+ Please see the value of CONFIG_RESERVE_ACTIVEFILE_KBYTES to set how many
+ KiloBytes of Active(file) to keep by default in the sysctl setting
+ vm.unevictable_activefile_kbytes
+ see Documentation/admin-guide/sysctl/vm.rst for more info
+
+config RESERVE_ACTIVEFILE_KBYTES
+ int "Set default value for vm.unevictable_activefile_kbytes"
+ depends on RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING
+ default "65536"
+ help
+ This is the default value(in KiB) that vm.unevictable_activefile_kbytes gets.
+ A value of at least 65536 or at most 262144 is recommended for users
+ of xfce4 to avoid disk thrashing on low-memory/memory-pressure conditions,
+ ie. mouse freeze with constant disk activity (but you can still sysrq+f to
+ trigger oom-killer though, even without this mitigation)
+
+ You can still sysctl set vm.unevictable_activefile_kbytes to a value of 0
+ to disable this whole feature at runtime.
+
+ see Documentation/admin-guide/sysctl/vm.rst for more info
+ see also CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING
+
+config RESERVE_INACTIVEFILE_TO_PREVENT_DISK_THRASHING
+ bool "Reserve some `Inactive(file)` to prevent disk thrashing"
+ depends on SYSCTL
+ def_bool y
+ help
+ Keep `Inactive(file)`(/proc/meminfo) pages in RAM so as to avoid system freeze
+ due to the disk thrashing(disk reading only) that occurrs because the running
+ executables's code is being evicted during low-mem conditions which is
+ why it also prevents oom-killer from triggering until 10s of minutes later
+ on some systems.
+
+ This happens with memfreeze3 script
+
+ Please see the value of CONFIG_RESERVE_INACTIVEFILE_KBYTES to set how many
+ KiloBytes of Inactive(file) to keep by default in the sysctl setting
+ vm.unevictable_inactivefile_kbytes
+ see Documentation/admin-guide/sysctl/vm.rst for more info
+
+config RESERVE_INACTIVEFILE_KBYTES
+ int "Set default value for vm.unevictable_inactivefile_kbytes"
+ depends on RESERVE_INACTIVEFILE_TO_PREVENT_DISK_THRASHING
+ default "65536"
+ help
+ This is the default value(in KiB) that vm.unevictable_inactivefile_kbytes gets.
+ A value of at least 65536 or at most 262144 is recommended for users
+ of xfce4 to avoid disk thrashing on low-memory/memory-pressure conditions,
+ ie. mouse freeze with constant disk activity (but you can still sysrq+f to
+ trigger oom-killer though, even without this mitigation)
+
+ You can still sysctl set vm.unevictable_inactivefile_kbytes to a value of 0
+ to disable this whole feature at runtime.
+
+ see Documentation/admin-guide/sysctl/vm.rst for more info
+ see also CONFIG_RESERVE_INACTIVEFILE_TO_PREVENT_DISK_THRASHING
+
config DISCONTIGMEM
def_bool y
depends on (!SELECT_MEMORY_MODEL && ARCH_DISCONTIGMEM_ENABLE) || DISCONTIGMEM_MANUAL
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dbdc46a84f63..c177225bd4cd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2445,6 +2445,31 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
BUG();
}
+#if defined(CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING)
+ if (LRU_ACTIVE_FILE == lru) {
+ extern unsigned int sysctl_unevictable_activefile_kbytes;
+ unsigned long kib_active_file_now = global_node_page_state(NR_ACTIVE_FILE) * MAX_NR_ZONES;
+ if (kib_active_file_now <= sysctl_unevictable_activefile_kbytes) {
+ // Don't reclaim any Active(file) (see /proc/meminfo) if they are under sysctl_unevictable_activefile_kbytes.
+ // See Documentation/admin-guide/sysctl/vm.rst and CONFIG_RESERVE_ACTIVEFILE_TO_PREVENT_DISK_THRASHING and CONFIG_RESERVE_ACTIVEFILE_KBYTES
+ nr[lru] = 0;
+ continue;
+ }
+ }
+#endif
+#if defined(CONFIG_RESERVE_INACTIVEFILE_TO_PREVENT_DISK_THRASHING)
+ if (LRU_INACTIVE_FILE == lru) {
+ extern unsigned int sysctl_unevictable_inactivefile_kbytes;
+ unsigned long kib_inactive_file_now = global_node_page_state(NR_INACTIVE_FILE) * MAX_NR_ZONES;
+ if (kib_inactive_file_now <= sysctl_unevictable_inactivefile_kbytes) {
+ // Don't reclaim any Inactive(file) (see /proc/meminfo) if they are under sysctl_unevictable_inactivefile_kbytes.
+ // See Documentation/admin-guide/sysctl/vm.rst and CONFIG_RESERVE_INACTIVEFILE_TO_PREVENT_DISK_THRASHING and CONFIG_RESERVE_INACTIVEFILE_KBYTES
+ nr[lru] = 0;
+ continue;
+ }
+ }
+#endif
+
*lru_pages += lruvec_size;
nr[lru] = scan;
}