-
Notifications
You must be signed in to change notification settings - Fork 22
/
random_notes.txt
370 lines (289 loc) · 16.9 KB
/
random_notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
Adison Wesley. Linux Kernel Development. Robert Love.
Chapter 3, Process Management
=============================
page 32:
-------
When an application executes a system call, we say that the kernel is
executing on behalf of the application. Furthermore, the application is said to be executing a
system call in kernel-space, and the kernel is running in process context.
Linux, the interrupt handlers do not run in a process context. Instead, they run in a special
interrupt context that is not associated with any process.This special context exists solely to
let an interrupt handler quickly respond to an interrupt, and then exit.
page 47:
-------
kernel memory is not pageable.Therefore, every byte of memory you
consume is one less byte of available physical memory.
page 50:
-------
Each thread includes a unique program counter, process stack, and set of proces-
sor registers.The kernel schedules individual threads, not processes.
processes provide two virtualizations: a virtualized
processor and virtual memory.The virtual processor gives the process the illusion that it
alone monopolizes the system, despite possibly sharing the processor among hundreds of
other processes. Virtual memory lets the process allocate and manage memory as if it alone owned all the mem-
ory in the system.
page 51:
-------
The process that calls fork() is the parent, whereas the new process is the child.The
parent resumes execution and the child starts execution at the same place: where the call
to fork() returns. fork() is actually implemented via the clone() system call,
a program exits via the exit() system call.This function terminates the process
and frees all its resources.A parent process can inquire about the status of a terminated
child via the wait4()1 system call, which enables a process to wait for the termination of
a specific process.When a process exits, it is placed into a special zombie state that repre-
sents terminated processes until the parent calls wait() or waitpid().
The kernel stores the list of processes in a circular doubly linked list called the task list.
Each element in the task list is a process descriptor of the type struct task_struct, which
is defined in <linux/sched.h>. The process descriptor contains all the information about
a specific process (contains open files, the process’s address space,
pending signals, the process’s state, and much more)
page 52:
-------
Each task’s thread_info structure is allocated at the end of its stack.The task element
of the structure is a pointer to the task’s actual task_struct.
page 54:
-------
TASK_RUNNING—The process is runnable; it is either currently running or on a run-
queue waiting to run. this is the only possible state for a process executing in user-space;
TASK_INTERRUPTIBLE—The process is sleeping (that is, it is blocked), waiting for
some condition to exist. When this condition exists, the kernel sets the process’s
state to TASK_RUNNING.The process also awakes prematurely and becomes runnable
if it receives a signal.
page 55:
-------
TASK_UNINTERRUPTIBLE—This state is identical to TASK_INTERRUPTIBLE except
that it does not wake up and become runnable if it receives a signal.
__TASK_STOPPED—Process execution has stopped; the task is not running nor is it
eligible to run.This occurs if the task receives the SIGSTOP, SIGTSTP, SIGTTIN,or
SIGTTOU signal or if it receives any signal while it is being debugged.
QUESTION: to which queue is a processed moved when it receives SIGSTOP?
page 56:
-------
All processes are descendants of the init process, whose PID is one.The kernel starts
init in the last step of the boot process.The init process, in turn, reads the system
initscripts and executes more programs, eventually completing the boot process.
The relationship between processes is stored in the process descriptor. Each task_struct
has a pointer to the parent’s task_struct, named parent, and a list of children, named
children.
page 59:
-------
The fork(), vfork(), and __clone() library calls all invoke the clone() system call with the
requisite flags.The clone() system call, in turn, calls do_fork(). do_fork() calls copy_process().
Back in do_fork(), if copy_process() returns successfully, the new child is woken up
and run. Deliberately, the kernel runs the child process first. In the common case of the
child simply calling exec() immediately, this eliminates any copy-on-write overhead that
would occur if the parent ran first and began writing to the address space.
READ: forking, do_fork() and copy_process().
page 60:
-------
The vfork()system call has the same effect as fork(), except that the page table entries
of the parent process are not copied. Instead, the child executes as the sole thread in the
parent’s address space, and the parent is blocked until the child either calls exec() or exits.
The child is not allowed to write to the address space.
page 61:
-------
Each thread has a unique task_struct and appears to the kernel as a normal process—
threads just happen to share resources, such as an address space, with other processes.
QUESTION: does each thread has a PID or how is it managed in Linux?
Threads are created the same as normal tasks, with the exception that the clone() system
call is passed flags corresponding to the specific resources to be shared:
clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0);
In contrast, a normal fork() can be implemented as clone(SIGCHLD, 0);
And vfork() is implemented as clone(CLONE_VFORK | CLONE_VM | SIGCHLD, 0);
QUESTION: what is an idle task?
page 62:
-------
The significant difference between kernel threads and normal processes is that
kernel threads do not have an address space. (Their mm pointer, which points at their
address space, is NULL.) They operate only in kernel-space and do not context switch into
user-space. Kernel threads, however, are schedulable and preemptable, the same as normal
processes.
page 63:
-------
a kernel thread can be created only by another kernel
thread.The kernel handles this automatically by forking all new kernel threads off of the
kthreadd kernel process.
When a process terminates, the kernel releases
the resources owned by the process and notifies the child’s parent of its demise.
page 64:
-------
READ: do_exit().
after do_exit, all objects associated with the task (assuming the task was the sole user)
are freed.The task is not runnable (and no longer has an address space in which to run)
and is in the EXIT_ZOMBIE exit state.The only memory it occupies is its kernel stack, the
thread_info structure, and the task_struct structure.The task exists solely to provide
information to its parent.After the parent retrieves the information, or notifies the kernel
that it is uninterested, the remaining memory held by the process is freed and returned to
the system for use.
After do_exit() completes, the process descriptor for the terminated process still exists,
but the process is a zombie and is unable to run. this enables the system to
obtain information about a child process after it has terminated.
page 65:
-------
After the parent has obtained information on its terminated child, or signified to the kernel that it
does not care, the child’s task_struct is deallocated.
The wait() family of functions are implemented via a single (and complicated) system
call, wait4().The standard behavior is to suspend execution of the calling task until one
of its children exits, at which time the function returns with the PID of the exited child.
Additionally, a pointer is provided to the function that on return holds the exit code of
the terminated child.
READ: release_task()
QUESTION: how is process group and exit() related along with zombies?
If a parent exits before its children, some mechanism must exist to reparent any child tasks
to a new process, or else parentless terminated processes would forever remain zombies,
wasting system memory.The solution is to reparent a task’s children on exit to either
another process in the current thread group or, if that fails, the init process.
page 67:
-------
When a task is ptraced, it is temporarily reparented to the debug-
ging process.When the task’s parent exits, however, it must be reparented along with its
other siblings. The solution is simply to keep a separate list of a process’s children
being ptraced—reducing the search for one’s children from every process to just two rela-
tively small lists.
With the process successfully reparented, there is no risk of stray zombie processes.The
init process routinely calls wait() on its children, cleaning up any zombies assigned to it.
QUESTION: ptrace() and parenting, why should it make debugging process as parent ?
Chapter 4, Process Scheduling
=============================
page 68:
-------
Multitasking operating systems come in two flavors: cooperative multitasking and
preemptive multitasking.
In preemptive multitasking, the scheduler decides when a process is to
cease running and a new process is to begin running. involuntarily
suspending a running process is called preemption.
in cooperative multitasking, a process does not stop running until it voluntary
decides to do so.The act of a process voluntarily suspending itself is called yielding.
page 69:
-------
although the O(1) scheduler was ideal for large server workloads—which
lack interactive processes—it performed below par on desktop systems,
page 70:
-------
A scheduler policy for processor-bound processes, therefore,
tends to run such processes less frequently but for longer durations.
page 71:
-------
The Linux kernel implements two separate priority ranges.The first is the nice value, a
number from –20 to +19 with a default of 0. Larger nice values correspond to a lower
priority—you are being “nice” to the other processes on the system.
The second range is the real-time priority.The values are configurable, but by default
range from 0 to 99, inclusive. Opposite from nice values, higher real-time priority values
correspond to a greater priority.
page 73:
-------
The Linux scheduler is modular, enabling different algorithms to schedule different types
of processes.This modularity is called scheduler classes. Scheduler classes enable different,
pluggable algorithms to coexist, scheduling their own types of processes. Each scheduler
class has a priority.The base scheduler code, which is defined in kernel/sched.c, iterates
over each scheduler class in order of priority.The highest priority scheduler class that has
a runnable process wins, selecting who runs next.
page 74:
-------
Process Scheduling in Unix Systems (whole section is interesting)
Indeed, given
that high nice value (low priority) processes tend to be background, processor-intensive
tasks, while normal priority processes tend to be foreground user tasks, this timeslice
allotment is exactly backward from ideal! ==> MEANS high nice value makes processor-intensive
tasks background and user tasks foreground. (it should be such that user processes should be
background and come foreground during any events?)
page 75:
-------
old unix schedulrs assign absolute timeslices to nice value yields
a constant switching rate but variable fairness.
CFS yields constant fairness but a variable switching rate.
page 76:
-------
CFS sets a target for its approximation of the “infinitely small scheduling
duration in perfect multitasking.This target is called the targeted latency.
Instead of using the nice value to calculate a timeslice, CFS uses the nice value to weight the propor-
tion of processor a process is to receive: Higher valued (lower priority) processes receive a
fractional weight relative to the default nice value, whereas lower valued (higher priority)
processes receive a larger weight.
CFS imposes a floor on the timeslice assigned to each
process.This floor is called the minimum granularity.
page 77:
-------
the proportion of processor time that any process receives is determined
only by the relative difference in niceness between it and the other runnable processes.
The nice values, instead of yielding additive increases to timeslices, yield geometric differ-
ences.The absolute timeslice allotted any nice value is not an absolute number, but a
given proportion of the processor. CFS is called a fair scheduler because it gives each
process a fair share—a proportion—of the processor’s time.As mentioned, note that CFS
isn’t perfectly fair, because it only approximates perfect multitasking, but it can place a
lower bound on latency of n for n runnable processes on the unfairness.
page 78:
-------
The vruntime variable stores the virtual runtime of a process, which is the actual runtime
(the amount of time spent running) normalized (or weighted) by the number of runnable
processes. CFS uses vruntime to account for how long a process has run and thus how much
longer it ought to run (function update_curr() does this).
page 79:
-------
update_curr() is invoked periodically by the system timer and also whenever a
process becomes runnable or blocks, becoming unrunnable. In this manner, vruntime is
an accurate measure of the runtime of a given process and an indicator of what process
should run next.
page 80:
-------
when CFS is deciding what process to run next, it picks the process with the smallest vruntime.
page 84:
-------
schedule() is generic with
respect to scheduler classes.That is, it finds the highest priority scheduler class with a
runnable process and asks it what to run next.
page 89:
-------
Context switching, the switching from one runnable task to another, is handled by the
context_switch()function defined in kernel/sched.c. It is called by schedule()
when a new process has been selected to run. It does two basic jobs:
1. Calls switch_mm(), which is declared in <asm/mmu_context.h>, to switch the vir-
tual memory mapping from the previous process’s to that of the new process.
2. Calls switch_to(), declared in <asm/system.h>, to switch the processor state from
the previous process’s to the current’s.This involves saving and restoring stack infor-
mation and the processor registers and any other architecture-specific state that
must be managed and restored on a per-process basis.
the kernel provides the need_resched flag to signify whether a reschedule should be performed
.This flag is set by scheduler_tick() when a process should be preempted, and
by try_to_wake_up() when a process that has a higher priority than the currently run-
ning process is awakened.The kernel checks the flag, sees that it is set, and calls schedule()
to switch to a new process.The flag is a message to the kernel that the scheduler should be
invoked as soon as possible because another process deserves to run.
Upon returning to user-space or returning from an interrupt, the need_resched flag is
checked. If it is set, the kernel invokes the scheduler before continuing.
The flag is per-process, and not simply global, because it is faster to access a value in
the process descriptor (because of the speed of current and high probability of it being
cache hot) than a global variable. (need_resched is a special flag variable inside the
thread_info structure).
page 90:
-------
In short, user preemption can occur
1. When returning to user-space from a system call
2. When returning to user-space from an interrupt handler
So when is it safe to reschedule? The kernel can preempt a task running in the kernel
so long as it does not hold a lock.That is, locks are used as markers of regions of nonpre-
emptibility. Because the kernel is SMP-safe, if a lock is not held, the current code is reen-
trant and capable of being preempted.
The first change in supporting kernel preemption was the addition of a preemption
counter, preempt_count, to each process's thread_info.This counter begins at zero and
increments once for each lock that is acquired and decrements once for each lock that is
released.When the counter is zero, the kernel is preemptible. Upon return from interrupt,
if returning to kernel-space, the kernel checks the values of need_resched and
preempt_count. If need_resched is set and preempt_count is zero, then a more impor-
tant task is runnable, and it is safe to preempt.Thus, the scheduler is invoked. If
preempt_count is nonzero, a lock is held, and it is unsafe to reschedule. In that case, the
interrupt returns as usual to the currently executing task.When all the locks that the cur-
rent task is holding are released, preempt_count returns to zero.At that time, the unlock
code checks whether need_resched is set. If so, the scheduler is invoked. Kernel preemption can
also occur explicitly, when a task in the kernel blocks or explicitly calls schedule().
page 91:
------
Kernel preemption can occur
1. When an interrupt handler exits, before returning to kernel-space
2. When kernel code becomes preemptible again
3. If a task in the kernel explicitly calls
schedule()
4. If a task in the kernel blocks (which results in a call to
schedule())
Real-Time Scheduling Policies (READ WHOLE SECTION)
will "a + b" invoke any system calls?