-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathscheduling.tex
199 lines (172 loc) · 10.9 KB
/
scheduling.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
\chapter{Scheduling}
\section{Perf Sched}
Another very interesting {\em perf} command is {\tt sched}. These are predefined events
that are associated with scheduling. With {\tt sched} one can see how and when the
workload is scheduled across the CPU's, when it is woken up and led to sleep.
So e.g. to know if such a simple workload like {\tt lat_mem_rd} is CPU affine or it is
scheduled all over the CPU's, let's record a sample run.
\starttyping
perf sched record lat_mem_rd 1 2048
\stoptyping
\starttyping
# show the scheduler map
perf sched map | head -n 30
,*A0 5382.466840 secs A0 => perf:15767
A0 *. 5382.466878 secs . => swapper:0
A0 . *B0 5382.478644 secs B0 => rcu_sched:8
A0 . *. 5382.478647 secs
,*C0 A0 . . 5382.478680 secs C0 => ksoftirqd/4:29
,*B0 A0 . . 5382.478682 secs
,*. A0 . . 5382.478685 secs
,*B0 A0 . . 5382.498640 secs
,*. A0 . . 5382.498643 secs
. A0 *B0 . . 5382.508689 secs
. A0 *. . . 5382.508691 secs
. A0 *B0 . . 5382.528655 secs
. A0 *. . . 5382.528658 secs
. A0 . *D0 . 5382.538673 secs D0 => kworker/11:0:65
. A0 . *. . 5382.538678 secs
. A0 *E0 . . . 5382.658644 secs E0 => kworker/8:0:13068
. A0 *. . . . 5382.658646 secs
. A0 . . *D0 . 5382.668657 secs
. A0 . . *. . 5382.668663 secs
. A0 . . *D0 . 5382.798640 secs
. A0 . . *. . 5382.798647 secs
. *F0 . . . . 5382.870995 secs F0 => multipathd:261
. *A0 . . . . 5382.870997 secs
. A0 . . *D0 . 5382.928688 secs
. A0 . . *. . 5382.928697 secs
. A0 . . . *G0 . 5382.992699 secs G0 => lat_mem_rd:15768
. *. . . . G0 . 5382.992705 secs
. *A0 . . . G0 . 5382.992751 secs
. *. . . . G0 . 5382.992755 secs
. . . . *D0 G0 . 5383.058671 secs
\stoptyping
The first four columns represent the four CPUs in the system. $A0$,
$B0$, $C0$ are ids for each executable that is called or scheduled
during runtime of {\tt lat_mem_rd}.
Every asterisk means a scheudling event. What is evident here in the
first 30 rows, is that {\tt lat_mem_rd} first runs on $CPU0$ and is
scheduled after some time to $CPU1$ as indicated by the *D0 where
{\tt lat_mem_rd} gets a new id.
Examining the complete trace shows that the workload is scheduled
across all CPUs and not running exclusively on one or affine to one
CPU.
Another nice features of {\tt perf sched} is to report the latencies of
the scheduling events per task. These latency can be correlated with
the timestamp from the table below with the {\tt perf sched map} output
above.
\starttyping
# show scheduling latencies
# perf sched lat
Task | Runtime ms | Switches | Maximum delay at
cron:808 | 0.009 ms | 1 | max at: 3500.02 s
perf:4737 | 2.890 ms | 1 | max at: 3507.16 s
kworker/u64:0:4543 | 0.015 ms | 5 | max at: 3488.38 s
khungtaskd:30 | 0.004 ms | 1 | max at: 3506.07 s
kworker/2:0:2146 | 0.054 ms | 14 | max at: 3502.20 s
ksoftirqd/1:14 | 0.003 ms | 1 | max at: 3488.79 s
kworker/u64:1:4624 | 0.010 ms | 3 | max at: 3485.08 s
kworker/0:2:414 | 0.042 ms | 13 | max at: 3487.43 s
lat_mem_rd:(20) | 23709.517 ms | 145 | max at: 3483.96 s
rcu_sched:7 | 0.307 ms | 126 | max at: 3504.74 s
pkcsslotd:825 | 0.010 ms | 2 | max at: 3487.48 s
irqbalance:867 | 0.078 ms | 2 | max at: 3493.11 s
jbd2/dasda1-8:317 | 0.090 ms | 9 | max at: 3501.99 s
kworker/3:2:445 | 0.007 ms | 2 | max at: 3488.39 s
migration/3:23 | 0.000 ms | 1 | max at: 3483.45 s
multipathd:(3) | 0.000 ms | 30 | max at: 3490.08 s
gmain:837 | 0.027 ms | 6 | max at: 3484.02 s
ksoftirqd/2:19 | 0.002 ms | 1 | max at: 3488.77 s
ksoftirqd/0:3 | 0.007 ms | 4 | max at: 3495.80 s
kworker/1:0:2740 | 0.058 ms | 27 | max at: 3499.83 s
TOTAL: | 23713.129 ms | 394 |
\stoptyping
Additionally the number of context switches is reported for all task
running at the time of {\tt lat_mem_rd}.
To have even more info about the separate events call {\tt perf sched}
with the {\tt script} argument.
\starttyping
# show detailed info about scheduling
# perf sched script | head -n 30 | cut -c 1-70
perf 1921 [000] 12208.501300: sched:sched_wakeup: perf:1
swapper 0 [001] 12208.501303: sched:sched_switch: swappe
perf 1921 [000] 12208.501318: sched:sched_switch: perf:1
perf 1924 [001] 12208.501326: sched:sched_wakeup: migrat
perf 1924 [001] 12208.501327: sched:sched_switch: perf:1
migration/1 13 [001] 12208.501331: sched:sched_switch: migrat
swapper 0 [002] 12208.501332: sched:sched_switch: swappe
swapper 0 [000] 12208.502371: sched:sched_wakeup: rcu_sc
swapper 0 [000] 12208.502372: sched:sched_switch: swappe
rcu_sched 7 [000] 12208.502374: sched:sched_switch: rcu_sc
swapper 0 [000] 12208.510368: sched:sched_wakeup: rcu_sc
swapper 0 [000] 12208.510370: sched:sched_switch: swappe
rcu_sched 7 [000] 12208.510372: sched:sched_switch: rcu_sc
swapper 0 [000] 12208.518367: sched:sched_wakeup: rcu_sc
swapper 0 [000] 12208.518369: sched:sched_switch: swappe
rcu_sched 7 [000] 12208.518371: sched:sched_switch: rcu_sc
swapper 0 [000] 12208.526366: sched:sched_wakeup: rcu_sc
swapper 0 [000] 12208.526368: sched:sched_switch: swappe
rcu_sched 7 [000] 12208.526369: sched:sched_switch: rcu_sc
swapper 0 [003] 12209.027500: sched:sched_switch: swappe
lat_mem_rd 1924 [002] 12209.027506: sched:sched_switch: lat_me
lat_mem_rd 1925 [003] 12209.027544: sched:sched_wakeup: lat_me
swapper 0 [002] 12209.027546: sched:sched_switch: swappe
lat_mem_rd 1924 [002] 12209.027549: sched:sched_switch: lat_me
swapper 0 [000] 12209.364191: sched:sched_wakeup: multip
swapper 0 [000] 12209.364192: sched:sched_switch: swappe
multipathd 418 [000] 12209.364194: sched:sched_switch: multip
swapper 0 [002] 12209.580448: sched:sched_wakeup: multip
swapper 0 [002] 12209.580449: sched:sched_switch: swappe
multipathd 417 [002] 12209.580451: sched:sched_switch: multip
\stoptyping
In Linux 4.10 an additional command to {\tt perf sched} was added, namely
{\tt timehist}. It shows the scheduler latency by event. It includes the
time the task was waiting to be woken up and the scheduler latency
after wakeup to running.
\starttyping
# -M migration events
# -V CPU
# -w wakup events
# perf sched timehist -MVw | head -n 20
time cpu 012345678 task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ --------- -------------------- --------- --------- ---------
291.481867 [0001] i <idle> 0.000 0.000 0.000
291.481925 [0005] i <idle> 0.000 0.000 0.000
291.481963 [0005] s Xorg[1884] 0.000 0.000 0.037 schedule_hrtimeout_range_clock <- schedu
291.482019 [0001] s irq/31-nvidia[1980] 0.000 0.000 0.152 irq_thread <- kthread <- ret_from_fork
291.482136 [0006] perf[6319] awakened: perf[6320]
291.482140 [0001] i <idle> 0.152 0.000 0.121
291.482175 [0006] s perf[6319] 0.000 0.000 0.000 schedule_hrtimeout_range_clock <- schedu
291.482562 [0001] lat_mem_rd[6320] awakened: kworker/u16:3[227]
291.482564 [0006] i <idle> 0.000 0.000 0.389
291.482569 [0006] m kworker/u16:3[227] migrated: terminator[6030] cpu 1 => 2
291.482572 [0006] kworker/u16:3[227] awakened: terminator[6030]
291.482574 [0006] s kworker/u16:3[227] 0.000 0.002 0.010 worker_thread <- kthread <- ret_from_for
291.482576 [0002] i <idle> 0.000 0.000 0.000
291.482614 [0002] s terminator[6030] 0.000 0.004 0.037 schedule_hrtimeout_range_clock <- schedu
\stoptyping
For a graphical overview of scheduling events of a workload {\tt perf timechart} can
be used. First record the needed samples.
\starttyping
# generate diagram of scheduling events + cycles
# perf timechart record lat_mem_rd 1 2048
\stoptyping
Now create a SVG out of the data.
\starttyping
perf timechart
\stoptyping
A very fast SVG viewer on a Linux system is /rsvg-view-3/.
\starttyping
# rsvg-view-3 output.svg
\stoptyping
\setupcaption [figure][location={bottom}]
\startplacefigure[title={Timechart of {\tt lat_mem_rd}, which is scheduled on all available {\sc cpus}}]
\externalfigure[sched][fullwidth]
\stopplacefigure
%\noindent
%\begin{tikzpicture}[remember picture,overlay]
%\node[anchor =north west, inner sep=0pt,outer sep=0pt] at (current page.north west) {\includegraphics[width=\paperwidth,height=0.45\paperheight]{./sched2.png}};
%\end{tikzpicture}
%#+END_LaTeX