-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
403 lines (313 loc) · 16.5 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
IMPORTANT INFO :
================
root :
./dht_peer -m 1 -p 16500 -h `hostname`
peer :
./dht_peer -m 0 -p 16501 -h `hostname` -R baseball.ccs.neu.edu -r 16500
client :
./client -p 1234 -h `hostname` -r 16500 -R baseball.ccs.neu.edu
test_client.py : is the pseudo client used to test.
dht_peer : dht_peer code
client : client code
nodes used for testing the code :
['akubra.ccs.neu.edu',
'budenovka.ccs.neu.edu',
'derby.ccs.neu.edu',
'kofia.ccs.neu.edu',
'oxford.ccs.neu.edu',
'baseball.ccs.neu.edu']
max object size = 255 Bytes
protocol :
==========
there are two protocols that are used by my implementation of CHORD,
1. control plane : for the formation of the ring and making sure that the ring
is stable.
2. data plane : for storing and retrieving data from the Distributed hash table.
each packet sent through the network has the following format :
____________________
| packetlen | packet |
--------------------
where :
packetlen : length of the entire packet (excluding the length of packetlen)
packetlen = 2 bytes
packet : data transferred by the underlying layers (control packet and data
packets)
protocol format for control plane :
=================================
___________________________
|type |ip |ip_hash | payload|
---------------------------
type = 1 byte
ip = 4 bytes
ip_hash = 160 bites
payload = (depends on the packet)
type of packets available in the control protocol :
join : this is the join request that is send by a peer to the root peer node
to indicate that it would like to join the network.
packet representation :
__________________________________________
|join | peer_ip | peer_ip_hash | peer_port |
------------------------------------------
init : this packet is sent through a new connection that is established with
another peer. This packet is sent by the new joinee to its successor and
predecessor after it establishes new tcp connections with them. These
packets are also send out after a new connection is established in an
event of a peer leaving the network or crashing.
packet representation :
______________________________________
| init | peer_ip | peer_ip_hash | type |
--------------------------------------
type = 1 if the type if the peer connecting is the predecessor of the
peer to which the connection is made. If the connection peer is the
successor of the peer being connected to then type = 2.
update : this packet is sent when there is a new node joining or if there is an
existing node that crashed. These packets have updates that are meant
for the node to which the packet is addressed to. On receiving such
an update packet the peer updates its cache with the mentioned peers
as successor/predecessors of it.
packet representation :
______________________________
| update | ip | ip_hash | data |
------------------------------
format for the data section of the update packet :
____________________________________________________________________
|for | succ_ip | succ_hash | succ_port | pre_ip | pre_hash | pre_port|
--------------------------------------------------------------------
for = 4 bytes (ip address of the intended recipient)
succ_ip = ip of the successor peer. (None => no update to be made)
succ_hash = ip hash of the successor peer.
(None => no update to be made)
succ_port = port number of the successor peer.
(None => no update to be made)
pre_ip = ip of the predecessor peer. (None => no update to be made)
pre_hash = ip hash of the predecessor peer.
(None => no update to be made)
pre_port = port number of the predecessor peer.
(None => no update to be made)
query : this packet is used to query the successor and the predecessor of the
peer to make sure that they are alive, and also make sure that the
current peer is listed as its successor's predecessor and predecessor's
successor.
packet representation :
_________________________________
| query | ip | ip_hash | question |
---------------------------------
question = 1 for successor and 2 for predecessor (1 byte)
answer : this packet is generated by a peer in response to a query posed to it
by another peer.
packet representation :
______________________________
| answer | ip | ip_hash | type |
------------------------------
ip = ip address of its predecessor/successor based on the query.
ip_hash = ip hash of its predecessor/successor based on the query.
type = 1 if the query was about successor else 2.
dead : this packet is generated by the successor of the node which left/crashed.
packet representation :
__________________________________________
| dead | ip | ip_hash | ip_hash_pre | port |
------------------------------------------
ip = ip address of the peer sending the packet.
ip_hash = ip hash of the peer sending the packet.
ip_hash_pre = ip hash of the dead predecessor.
port = port number of the peer sending the packet.
protocol format for data plane :
===============================
_________________________________
|type |client_id |operation |data |
---------------------------------
type = 1 byte
client_id = 160 bits
operation = 1 byte
data = (depends on the packet)
type of packets available in the data protocol :
put : this packet is used to store the data in a peer.
packet representation :
_______________________________
| data | client_id | put | data |
-------------------------------
data representation :
_________________________
| obj_key | obj_len | obj |
-------------------------
obj_key = hash value of the key of the object.(160 bits)
obj_len = length of the object in bytes
(max obj size = 512 bits => 64 bytes)
obj = actual object that needs to be stored.
get : this packet is used to directly get data from a peer.
packet representation :
__________________________________
| data | client_id | get | obj_key |
----------------------------------
obj_key = hash value of the key of the object. (160 bits)
lookup : packet is used by the client to do a lookup of the ring to
figure out where the data has to be stored to / retrieved from.
packet representation :
______________________________________________
| data | client_id | lookup | method | obj_key |
----------------------------------------------
method = 1 if recursive lookup , 2 for iterative lookup (i byte)
obj_key = hash value of the key of the object.(160 bits)
response : packet generated by a peer after the client contacts it for data.
packet representation :
____________________________________
| data | client_id | response | data |
------------------------------------
data representation :
_________________________
| obj_key | obj_len | obj |
-------------------------
obj_key = hash value of the key of the object.(160 bits)
obj_len = length of the object in bytes
(max obj size = 512 bits => 64 bytes)
obj = actual object that is retrieved.
redirect : packet used by a peer to redirect the client to its successor peer.
packet representation :
__________________________________________
| data | client_id | redirect | ip | port |
------------------------------------------
ip = ip of the successor peer.
port = port of the succesor peer.
close : packet sent to the client to close its connection to the peer.
packet representation :
__________________________
| data | client_id | close |
--------------------------
move : this option is used to transfer objects between nodes.
this packet type is used when a new node joins the network and there
are objects in the network stored on other nodes that have to be
transferred to the new node.
packet representation :
________________________________
| data | client_id | move | data |
--------------------------------
Client :
========
language used : python.
usage :
./client -p 1234 -h `hostname` -r 16500 -R baseball.ccs.neu.edu
the client when executed presents the user with a menu on selecting an option
the required function is executed.
on selecting store , the client first does a recursive lookup of the peer on
which the object needs to be stored. On finding the peer the client sends a put
request to that peer.
in order to retrieve the client is presented with 2 options, recursive retrieval
and iterative. In a recursive retrieval, the client makes a request to the root
peer and the peer responds with the details of the peer at which the data has to
be retrieved from. In the iterative scenario, the peer responds with a redirect
message redirecting the client to its successor if it is not the required peer.
If the current peer is the required peer then the peer responds with a redirect
message with the ip and port set to None. On figuring out the right node the
client sends a get message to the right peer.
the last option that is presented is the exit option which causes the client
to terminate.
Peer :
======
language used : python.
usage :
root :
./dht_peer -m 1 -p 16500 -h `hostname`
peer :
./dht_peer -m 0 -p 16501 -h `hostname` -R baseball.ccs.neu.edu -r 16500
the peer runs a thread that periodically queries the successor and the
predecessor of this peer to make sure that they are alive and are responding to
the query posed to them. if the peer is dead then the cache is updated with this
information.
the main thread uses select to check if there are any socket that need to be read
from. If there is some data to be read from any of the sockets that is maintained
(the server socket, the successor and predecessor sockets). the the information
is read and parsed. If any response is required to be sent by the peer then such
a message is sent to either the successor or predecessor sockets.
when a new peer is added to the chord ring. the predecessor of the new node
notifies its successor that its predecessor is the new node by issuing an update
packet to it. on receiving the update packet the successor node updates its cache
accordingly. The predecessor also sends an update message to the root via its
predecessor (recursively), this update message is for the new node. On receiving
this update from the root node. The new node established connections with its
successor and predecessor as mentioned in the update packet. The new node also
sends out init packets to initialize these newly established sockets and to notify
its predecessor that the new node is its successor and similarly its successor.
when a node dies/leaves the network, a dead message is sent by the predecessor of
the node that died to its predecessor . till the message reaches the node whose
successor was the dead node. On receiving the dead notification the successor
connects to the dead nodes predecessor and the ring is completed.
when a lookup data packet arrives at the root from the client the object key is
compared with the ip hash of the current peer as well as the ip hash of the
predecessor/ successor peer to decide if the object must be stored or retrieved
from the current peer. if so a response message is sent with the port and ip
address of the current peer else, the message is forwarded to the successor if
a recursive lookup is being done, else a redirect message is sent to the client
redirecting the client to the successor of the current peer. once the lookup is
complete the client gets/stores the packet in the correct peer.
In the event of a new node being added to the network objects(pre existing in
the network) that will hash to this node after the node joins the network are
transferred to the node by the new nodes successor and predecessor nodes.
KEY CHALLENGES:
===============
formation of the ring was very challenging taking up a lot of time and effort
in debugging the various connections and making sure that the node joins the
network at its expected location.
closing unwanted connections (connections from clients) and previous successor
and predecessor node was quite confusing and involved a lot of debugging.
Transferring objects that already existed in the network to the new node.
TESTING:
========
Testing the peers :
1. testing if the ring formed correctly :
This test was conducted with 4 nodes where one node was assigned as the
root peer and all permutation of the other 3 nodes joining the network were
tried in different orders. The ring formed correctly at every single try.
2. testing formation of ring with 5 nodes :
about 5 combination of the exiting 24 combinations were tried with the root
node fixed and other peers joining in the order of various permutations.
Each time the ring formed correctly.
3. root has the largest IP_HASH : the root was selected such that the root had
had the largest IP_HASH in the network and various combination of joins and
leaves of peers were tested. Each time the ring formed/re-formed correctly.
4. root has the smallest IP_HASH : same test as (3) but with root having
the smallest IP_HASH in the network.
5. root not the smallest or largest IP_HASH : same test as (3) but with root
hash not being the largest or smallest in the network.
6. node leaving the network : (send SIGINT to the process)
when the node leaves the network the predecessor of the dead node contacts
the successor of the dead node and reforms the ring. The test was executed
with multiple nodes leaving the network one after another, each leaving
after the ring reconnects. Each time the ring was able to reform correctly.
7. node leaving the network and rejoining the network : the nodes are able
to leave the network and rejoin the network each after the ring is given time
to heal. Each time the node joins the network at the same position and the
ring is formed correctly.
8. testing the ring with pseudo client :
a pseudo client was written to generate various objects such that the
object key hash would lie in between the nodes in the ring and also coincided
with the hash of the node. Storing and retrieving the objects were successful.
9. testing moving of objects between nodes :
the same objects where used as in (8) but only the root node was up when
the data was stored by the client. Each peer was added one by one, checking
to make sure that the object files and index file reflected the movement of
objects. Each time the objects were moved correctly to the right nodes.
10. retrieving data stored in a crashed node : once all the data was stored
the pseudo client was used to retrieve the same object. The client displays
the received error message and the node from which it got the message from.
Testing client :
1. store : The client was able to store an object successfully at each node
in the ring. The objects were stored at the expected node.The client success
fully did the lookup and found the right node at which the data had to be
stored before storing the data.
2. retrieve recursive : the client was able to retrieve the data from each node
successfully after it did a look up to find the node where the data is
stored.
3. retrieve iterative : the client was able to send out iterative look up
messages to each node starting from the root node, able to successfully
find the next node from the redirect requests. The client was able to find
the right node where the object was stored and retrieve it.
4. client exits only when asked to exit : made sure that the client does not
exit on any invalid input the client just displays the menu.
IMPORTANT DESIGN FEATURES :
===========================
in order to form the ring the node with the smallest IP_HASH and the largest
IP_HASH have special markers in their global cache which handles edge cases
where a node/data is being put into the network.
TCP data is received as a continuous stream of data hence each packet has a
length field attached to it so that each packet can be separated and processed
individually.