-
Notifications
You must be signed in to change notification settings - Fork 7
/
PORTING
324 lines (257 loc) · 14.2 KB
/
PORTING
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
How to Port Open vSwitch to New Software or Hardware
====================================================
Open vSwitch (OVS) is intended to be easily ported to new software and
hardware platforms. This document describes the types of changes that
are most likely to be necessary in porting OVS to Unix-like platforms.
(Porting OVS to other kinds of platforms is likely to be more
difficult.)
Vocabulary
----------
For historical reasons, different words are used for essentially the
same concept in different areas of the Open vSwitch source tree. Here
is a concordance, indexed by the area of the source tree:
datapath/ vport ---
vswitchd/ iface port
ofproto/ port bundle
ofproto/bond.c slave bond
lib/lacp.c slave lacp
lib/netdev.c netdev ---
database Interface Port
Open vSwitch Architectural Overview
-----------------------------------
The following diagram shows the very high-level architecture of Open
vSwitch from a porter's perspective.
+-------------------+
| ovs-vswitchd |<-->ovsdb-server
+-------------------+
| ofproto |<-->OpenFlow controllers
+--------+-+--------+
| netdev | | ofproto|
+--------+ |provider|
| netdev | +--------+
|provider|
+--------+
Some of the components are generic. Modulo bugs or inadequacies,
these components should not need to be modified as part of a port:
- "ovs-vswitchd" is the main Open vSwitch userspace program, in
vswitchd/. It reads the desired Open vSwitch configuration from
the ovsdb-server program over an IPC channel and passes this
configuration down to the "ofproto" library. It also passes
certain status and statistical information from ofproto back
into the database.
- "ofproto" is the Open vSwitch library, in ofproto/, that
implements an OpenFlow switch. It talks to OpenFlow controllers
over the network and to switch hardware or software through an
"ofproto provider", explained further below.
- "netdev" is the Open vSwitch library, in lib/netdev.c, that
abstracts interacting with network devices, that is, Ethernet
interfaces. The netdev library is a thin layer over "netdev
provider" code, explained further below.
The other components may need attention during a port. You will
almost certainly have to implement a "netdev provider". Depending on
the type of port you are doing and the desired performance, you may
also have to implement an "ofproto provider" or a lower-level
component called a "dpif" provider.
The following sections talk about these components in more detail.
Writing a netdev Provider
-------------------------
A "netdev provider" implements an operating system and hardware
specific interface to "network devices", e.g. eth0 on Linux. Open
vSwitch must be able to open each port on a switch as a netdev, so you
will need to implement a "netdev provider" that works with your switch
hardware and software.
struct netdev_class, in lib/netdev-provider.h, defines the interfaces
required to implement a netdev. That structure contains many function
pointers, each of which has a comment that is meant to describe its
behavior in detail. If the requirements are unclear, please report
this as a bug.
The netdev interface can be divided into a few rough categories:
* Functions required to properly implement OpenFlow features. For
example, OpenFlow requires the ability to report the Ethernet
hardware address of a port. These functions must be implemented
for minimally correct operation.
* Functions required to implement optional Open vSwitch features.
For example, the Open vSwitch support for in-band control
requires netdev support for inspecting the TCP/IP stack's ARP
table. These functions must be implemented if the corresponding
OVS features are to work, but may be omitted initially.
* Functions needed in some implementations but not in others. For
example, most kinds of ports (see below) do not need
functionality to receive packets from a network device.
The existing netdev implementations may serve as useful examples
during a port:
* lib/netdev-linux.c implements netdev functionality for Linux
network devices, using Linux kernel calls. It may be a good
place to start for full-featured netdev implementations.
* lib/netdev-vport.c provides support for "virtual ports"
implemented by the Open vSwitch datapath module for the Linux
kernel. This may serve as a model for minimal netdev
implementations.
* lib/netdev-dummy.c is a fake netdev implementation useful only
for testing.
Porting Strategies
------------------
After a netdev provider has been implemented for a system's network
devices, you may choose among three basic porting strategies.
The lowest-effort strategy is to use the "userspace switch"
implementation built into Open vSwitch. This ought to work, without
writing any more code, as long as the netdev provider that you
implemented supports receiving packets. It yields poor performance,
however, because every packet passes through the ovs-vswitchd process.
See INSTALL.userspace for instructions on how to configure a userspace
switch.
If the userspace switch is not the right choice for your port, then
you will have to write more code. You may implement either an
"ofproto provider" or a "dpif provider". Which you should choose
depends on a few different factors:
* Only an ofproto provider can take full advantage of hardware
with built-in support for wildcards (e.g. an ACL table or a
TCAM).
* A dpif provider can take advantage of the Open vSwitch built-in
implementations of bonding, LACP, 802.1ag, 802.1Q VLANs, and
other features. An ofproto provider has to provide its own
implementations, if the hardware can support them at all.
* A dpif provider is usually easier to implement, but most
appropriate for software switching. It "explodes" wildcard
rules into exact-match entries (with an optional wildcard mask).
This allows fast hash lookups in software, but makes
inefficient use of TCAMs in hardware that support wildcarding.
The following sections describe how to implement each kind of port.
ofproto Providers
-----------------
An "ofproto provider" is what ofproto uses to directly monitor and
control an OpenFlow-capable switch. struct ofproto_class, in
ofproto/ofproto-provider.h, defines the interfaces to implement an
ofproto provider for new hardware or software. That structure contains
many function pointers, each of which has a comment that is meant to
describe its behavior in detail. If the requirements are unclear,
please report this as a bug.
The ofproto provider interface is preliminary. Please let us know if
it seems unsuitable for your purpose. We will try to improve it.
Writing a dpif Provider
-----------------------
Open vSwitch has a built-in ofproto provider named "ofproto-dpif",
which is built on top of a library for manipulating datapaths, called
"dpif". A "datapath" is a simple flow table, one that is only required
to support exact-match flows, that is, flows without wildcards. When a
packet arrives on a network device, the datapath looks for it in this
table. If there is a match, then it performs the associated actions.
If there is no match, the datapath passes the packet up to ofproto-dpif,
which maintains the full OpenFlow flow table. If the packet matches in
this flow table, then ofproto-dpif executes its actions and inserts a
new entry into the dpif flow table. (Otherwise, ofproto-dpif passes the
packet up to ofproto to send the packet to the OpenFlow controller, if
one is configured.)
When calculating the dpif flow, ofproto-dpif generates an exact-match
flow that describes the missed packet. It makes an effort to figure out
what fields can be wildcarded based on the switch's configuration and
OpenFlow flow table. The dpif is free to ignore the suggested wildcards
and only support the exact-match entry. However, if the dpif supports
wildcarding, then it can use the masks to match multiple flows with
fewer entries and potentially significantly reduce the number of flow
misses handled by ofproto-dpif.
The "dpif" library in turn delegates much of its functionality to a
"dpif provider". The following diagram shows how dpif providers fit
into the Open vSwitch architecture:
_
| +-------------------+
| | ovs-vswitchd |<-->ovsdb-server
| +-------------------+
| | ofproto |<-->OpenFlow controllers
| +--------+-+--------+ _
| | netdev | |ofproto-| |
userspace | +--------+ | dpif | |
| | netdev | +--------+ |
| |provider| | dpif | |
| +---||---+ +--------+ |
| || | dpif | | implementation of
| || |provider| | ofproto provider
|_ || +---||---+ |
|| || |
_ +---||-----+---||---+ |
| | |datapath| |
kernel | | +--------+ _|
| | |
|_ +--------||---------+
||
physical
NIC
struct dpif_class, in lib/dpif-provider.h, defines the interfaces
required to implement a dpif provider for new hardware or software.
That structure contains many function pointers, each of which has a
comment that is meant to describe its behavior in detail. If the
requirements are unclear, please report this as a bug.
There are two existing dpif implementations that may serve as
useful examples during a port:
* lib/dpif-linux.c is a Linux-specific dpif implementation that
talks to an Open vSwitch-specific kernel module (whose sources
are in the "datapath" directory). The kernel module performs
all of the switching work, passing packets that do not match any
flow table entry up to userspace. This dpif implementation is
essentially a wrapper around calls into the kernel module.
* lib/dpif-netdev.c is a generic dpif implementation that performs
all switching internally. This is how the Open vSwitch
userspace switch is implemented.
Miscellaneous Notes
-------------------
Open vSwitch source code uses uint16_t, uint32_t, and uint64_t as
fixed-width types in host byte order, and ovs_be16, ovs_be32, and
ovs_be64 as fixed-width types in network byte order. Each of the
latter is equivalent to the one of the former, but the difference in
name makes the intended use obvious.
The default "fail-mode" for Open vSwitch bridges is "standalone",
meaning that, when the OpenFlow controllers cannot be contacted, Open
vSwitch acts as a regular MAC-learning switch. This works well in
virtualization environments where there is normally just one uplink
(either a single physical interface or a bond). In a more general
environment, it can create loops. So, if you are porting to a
general-purpose switch platform, you should consider changing the
default "fail-mode" to "secure", which does not behave this way. See
documentation for the "fail-mode" column in the Bridge table in
ovs-vswitchd.conf.db(5) for more information.
lib/entropy.c assumes that it can obtain high-quality random number
seeds at startup by reading from /dev/urandom. You will need to
modify it if this is not true on your platform.
vswitchd/system-stats.c only knows how to obtain some statistics on
Linux. Optionally you may implement them for your platform as well.
Why OVS Does Not Support Hybrid Providers
-----------------------------------------
The "Porting Strategies" section above describes the "ofproto
provider" and "dpif provider" porting strategies. Only an ofproto
provider can take advantage of hardware TCAM support, and only a dpif
provider can take advantage of the OVS built-in implementations of
various features. It is therefore tempting to suggest a hybrid
approach that shares the advantages of both strategies.
However, Open vSwitch does not support a hybrid approach. Doing so
may be possible, with a significant amount of extra development work,
but it does not yet seem worthwhile, for the reasons explained below.
First, user surprise is likely when a switch supports a feature only
with a high performance penalty. For example, one user questioned why
adding a particular OpenFlow action to a flow caused a 1,058x slowdown
on a hardware OpenFlow implementation [1]. The action required the
flow to be implemented in software.
Given that implementing a flow in software on the slow management CPU
of a hardware switch causes a major slowdown, software-implemented
flows would only make sense for very low-volume traffic. But many of
the features built into the OVS software switch implementation would
need to apply to every flow to be useful. There is no value, for
example, in applying bonding or 802.1Q VLAN support only to low-volume
traffic.
Besides supporting features of OpenFlow actions, a hybrid approach
could also support forms of matching not supported by particular
switching hardware, by sending all packets that might match a rule to
software. But again this can cause an unacceptable slowdown by
forcing bulk traffic through software in the hardware switch's slow
management CPU. Consider, for example, a hardware switch that can
match on the IPv6 Ethernet type but not on fields in IPv6 headers. An
OpenFlow table that matched on the IPv6 Ethernet type would perform
well, but adding a rule that matched only UDPv6 would force every IPv6
packet to software, slowing down not just UDPv6 but all IPv6
processing.
[1] Aaron Rosen, "Modify packet fields extremely slow",
openflow-discuss mailing list, June 26, 2011, archived at
https://mailman.stanford.edu/pipermail/openflow-discuss/2011-June/002386.html.
Questions
---------
Please direct porting questions to [email protected]. We will try
to use questions to improve this porting guide.