-
Notifications
You must be signed in to change notification settings - Fork 0
/
notes
159 lines (131 loc) · 5.89 KB
/
notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
10/19 ideas
--------------------------------------------------------------------------------
multiple threads:
epoll
asynch IO
primitives:
spawn "thread"
interface to non-blocking calls
interface to block until ready
communicate buffer to other C code
assuming no thread can expect either file or network IO (at once)...
avoid synchronization between dueling event loops
let threads yield anyway?
web server thread:
read (blocking)
parse request
sanity-check file name
open file (blocking)
compute response header
write response header (blocking)
exit or the following loop
loop
read from file (blocking)
write to network (blocking)
-- can we fill the read buffer while sending on the network?
web server top-level:
set up socket
open socket
loop
accept network (blocking)
spawn thread (non-blocking)
-----------------------------------------------
epoll:
- create an epoll instance with epoll_create
edge triggered vs. level triggered:
- edge triggered returns if/when an event happens
(when the fd *changes*):
- use nonblocking file descriptors
- wait for an even *only* if you got EAGAIN
- hm; what about when you reading *and* writing
- level triggered: return whenever there is data
With level triggered, epoll can't block while there is data ready,
which means if there is an outstanding epoll event from some thread
but it is waiting on something else, we are screwed. If we use level
triggered, then we need to not have active epoll events that we are
not currently waiting on.
It would be nice if we didn't need to call epoll_ctl every time a
thread blocks on an epoll, although it isn't necessarily critical.
(With EPOLLONESHOT, we would only need to add it before each epoll,
but not remove it after.)
So I think we want to do edge triggered and leave in events we are not
currently waiting for. If an event comes in that nobody is blocked on,
ignore it.
- need to increase file descriptor limit
-----------------------------------------------
async IO may be a world of pain; one source I found indicates that
the glibc AIO library doesn't use special kernel support, but instead
spawns threads. Also, it isn't clear that there are any good ways to
wait on AIO completion that don't involve scanning through all of the
AIO events. I'm looking into whethere there are other options.
OK, async IO should be just fine. There is a poorly documented
linux-specific facility (with wrappers in libaio) that seems to
actually be what we want. You can register AIO requests. The function
to wait for them takes a timeout and returns a list of completed
event.
Furthermore, each request can be associated with an eventfd that will
be signaled when it is completed.
Can async IO things return short?
What about the max aio limit?
-----------------------------------------------
To merge the event loop, it looks like we want to use eventfd.
There are a bunch of different ways we could do this.
We could have one eventfd per fd we are doing async IO on, but that
seems higher overhead than necessary.
One important constraint in designing this interaction is that neither
normal epoll events or aio events should starve the other. If there
is lots of activity on both, both should be getting handled.
eventfd's have a 64-bit integer value associated with them internally.
It will incremented by each completed aio event by the kernel.
There are two modes for the eventfd:
- In semaphore mode, if the value is nonzero, read() return 1 and
decrements the value.
- In normal mode, if the value is nonzero, read() returns the value
and sets the value to zero.
With the aio fd, we actually need to worry about the edge triggered
vs. level triggered thing.
There are couple ways to do this I think of (avoiding starvation is an
important concern):
- We could put the eventfd in semaphore mode and keep it level
triggered. Then, whenever epoll returns the eventfd event,
read from it and then wait for an aio event (which shouldn't
block). One downside is that since there is just one aio event, it
might get returned less often when there are lots of active network
connections (since the buffer that gets filled with the events is
fixed size). Two improvements to this include trying to handle many
aio events each time the aio event triggers (by reading from the
eventfd until it returns EAGAIN or we get N events). Furthermore, we
can make it so that we always try a nonblocking read from the eventfd
each time through the loop, regardless of whether it has been
signaled.
- Put the eventfd in regular mode (don't know how much the triggering
matters). We then track whether there might be finished aio
requests. There might be finished aio requests if we have received an
eventfd notification since the last time aio indicated there were no
more completed requests. Note that as soon as the eventfd is read,
the value is set to zero, even though all the aio events haven't been
handled. Thus, if we think there might be unhandled aio events, we
need to make epoll not block. We probably want to try reading from
aio regardless of whether we think there might be finished events, so
that we don't need to wait for epoll to pick that to return.
I think I like the second option more. The first seems simpler at
first (you don't need to track whether to make nonblocking epoll
calls), but when it is developed fully it isn't really simpler and it
makes more system calls.
Then, pseudocode for the main loop is something like:
(MAX_EVENTS is probably something like 8? I dunno)
expect_aio = false
while true:
aio_events = io_getevents(...stuff.., MAX_EVENTS)
if (len(aio_events) < MAX_EVENTS):
expect_aio = false
for event in aio:
handle_aio_event(event)
epoll_timeout = if expect_aio then 0 else infinity
epoll_events = epoll_wait(...stuff..., MAX_EVENTS, epoll_timeout)
for event in epoll_events:
if event.fd = eventfd:
read(eventfd)
expect_aio = true
else:
handle_epoll_event(event)