thesis_plan_notes.txt

software engineering method (incremental model):
- model is designed, implemented and tested incrementally till the product is finished

- in this case, final vision was to implement a chord node with virtual node functionality
- initial model was to first implement system to use a single node when started
- additionally, the base protocol would be implemented then time permitting, extensions to the protocol as outlined in the paper would be added

** requirements analysis **
-These would be picked out from the Chord academic paper; first selecting the minimum number of functions required to have a working implementation of the protocol
	- hashing (SHA-1)
	- scalable key location (finger tables) (Fig.5)
	- node create, join, stabilize
- failure and replication functions would be implemented if time permitted
	- transferring keys during voluntary node departures	
	- successor list to improve lookup robustness

** design **
- system will be designed to function as a library that other apps can use
- will have a top layer through which external applications can do lookup requests. Start/stop network.
- took a layered approach to the software services. transport layer -> node (storage) -> Top layer
- Advantage of this is that transport or storage layer implementations could be swapped which maintaining the same interfaces for the chord logic to interact with.

** implementation **
* OTP structure
- explain what OTP is
- top layer that starts supervised processes
- nodes are genservers
* data structures
- representing finger table. Initially, had thought of literally translating from the paper. (start, interval, successor). But realised its pointless storing intervals
- representing a node entry
- dynamic transport layer imports
* maintaining state
- node state
* transport layer
- mapping nodes to processes. Initial idea was using an agent/registry. but as pids can change if a genserver is restarted, this wasn't feasible
- assigned names to each node process and used GenServer.whereis to map the name to pid.


** testing **
description
method
results
* load balance
- the key sets are falling into a small key range in the network
- some nodes in the network are isolated
- virtual nodes not being used to load balance is off.

- random key function could be better. Assumption was that keys generated from sequence of numbers would be random enough. Maybe the key space from these is small.


** future work **
- extension methods
  - successor lists
  - notify old predecessor when a new one is found (sect. Va).
  - improving reselience against network partitions (sect. VI)
  - implementing extensions from other verions of the paper?
- improving error handling. Especially when node object is nil.
  - avoiding defensive programming as this is an anti-pattern in Elixir.
  - a matter of getting better at Elixir programming and refactor code

** Professional issues **
- Licensing:- are the p2p lookup schemes promoting piracy? e.g. Napster, Gnutella. (Look into current ones being used in that file sharing space)
- Privacy:- the protocols essentially match an ID to a node address. With recent laws classifying IP addresses as personally identifiable info, this could be a privacy concern. Reason being that a rogue node might leak these addresses. Or they could be used to map to users who are pirating files. (look into legality of that)

- we discussed early iterations of p2p protocols (napster, gnutella) and how they gained popularity in file sharing/piracy. led to shutdown of napster and limewire(Gnutella).
- successor protocols have also found their way into the public file sharing space. Bittorrent is the most popular file sharing protocol. Several client software exist for it such as utorrent, bittorrent, transmission (comes pre-installed on Ubuntu distros)
- two ways peer discovery happens:
  - querying a dedicated tracker node. The tracker supplies info about peers serving a file. Peers "announce" themselves to a tracker when they have a file to share. [springer]
  - queries other peer nodes for who has a file. Known as trackerless torrents. It utilizes Kademlia over UDP to maintain a DHT of peer contact info for "trackerless" torrents. [bitorrent.org]

* What is being done to counter piracy
- as the network is open, authorities or copyright agencies can attempt to identify copyrighted file sharers. Research done by [springer] on tracking file bittorrent file sharers suggests two methods of identitification.
  - The first is requesting peer information for the copyrighted file using an info hash. INfo hash consists of a shared file's meta info. The IP addresses can then be extracted from peer info to identify users.
  - The second is using the DHT protocol to get peers that have an info hash for a copyright file
- sending warnings to all users identified pirating content proved ineffective
  https://torrentfreak.com/uk-isps-stop-sending-copyright-infringement-notices-190719/
  https://www.theregister.com/2019/07/20/creative_content_piracy/

- Conclusion(?): While these protocols have given us greater scalability for file storage and accessibility. Developers have to be weary of activity they enable.
- an additional food for thought: the existence of all these softwares is a sign that people demand more open access to content. Going against the current restrictive region-locked, monopolised control of content.
  - They believed that piracy was a "service problem" created by "an industry that portrays innovation as a threat to their antique recipe to collect value",
- maybe content providers could adopt these peer to peer models for content distribution while integrating a fair pricing model. It is evident that people don't mind paying for content. (music and video streaming). Addiitonally, it might benefit content providers in terms of infrastructure. less need to maintain CDNs. load balance amongst users.


** self assessment **
- how did it go

- planning and executing in a project
  - I have learnt not to be scared to set expectations when it comes to implementation. When reading the Chord paper, the pseudocode looked easy enough to write up but only after starting did I realise it would be more of a challenge. Because I don't have much experience with Elixir, implementing was a mix of learning about the nuances of the language and how best to implement. So the initial plan of doing the base protocol and all the extensions didn't work out.
  - learning that its ok to have a possibly inefficient software and iterate on that. This was necessary given the short project timeline. Historically in my previous roles, I would strive for perfection which induced more workload stress that necessary when I could have distributed this development over time.

- where to go next
  - ideally get a job where I work on such distributed systems challenges. Might be tricky as most of the roles found thus far want someone already experience in large scale distributed system design. What that means, career-wise I might start off in a role not directly related but work my way up there.
  - Elixir is a great language that I would love to work commercially with. But as with upcoming languages, the preference seems to be for more experienced devs with the assumption that they will train others later down the road.
  - another alternative is open source contributions. I have recently started doing some for an Elixir project and has been a good way to build experience in that. I am also going to look for open source distributed system projects.

All in all, it was a fun project to do. From digging deeper into peer to peer protocols to learning elixir. It was a perfect project with good enough depth for me to leverage a good variety of the language's technologies