Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Ideas from node-config #1

Open
yanosz opened this issue Aug 29, 2019 · 4 comments
Open

[RFC] Ideas from node-config #1

yanosz opened this issue Aug 29, 2019 · 4 comments

Comments

@yanosz
Copy link

yanosz commented Aug 29, 2019

Heiho,

cool, to have a write up covering your (@christf ) ideas. I had similar things in mind, when writing node-config a few years ago. Merging these two ideas involves a lot of personal discussions I, guess. Thus I'd like to an RFC ticket.

I'd like to challange some of your assumptions. To make some thesis statements:

  • A IP-Mesh ("L3") having a lot of host routes scales worse, than a wifi / batman-adv ("L2") mesh: A host routes roughly needs 2 x 128-Bit + 3 x 32 Bit = 320 Bit = 40 Byte per entry - a L2 route needs 2 x 48 Bit = 96 Bit = 12 Bytes per entry - that's factor three. It gets interesting, when looking at mgmt traffic and filtering some broadcast-stuff - this is non-trivial since a combination of different protocols (e.g. multicast to unicast optimizations) have significant impact.

  • A network is not decentralized, if it's run by a dedicated group of administrators. Decentralisation is a social and not a technical goal. But also adressing it technically, should result in the most simple design possible.

IMHO some of node-config's design considerations either appear to be more radical (https://kbu.freifunk.net/files/node-config/doc/#_design_considerations) or are not reflected in your setup. Node-config's considerations are:

  1. You don't need any servers: Some Freifunk network designs promote data center based servers (supernodes) to provide basic network functionality (i.e. routing, dhcp). This needs extra funding, requires a reliable internet connection as well as a data center and introduces additional complexity. As a requirement for node-config, basic functionality has to be provided by the nodes.

  2. Use vanilla OpenWRT releases: Many community networks are based on custom OpenWRT builds or forks. To start a network, OpenWRT has to be compiled or forked at first. Usually, only a few people know how to maintain or compile the software needed to run the network. Node-config is required to use OpenWRT as-it-is; using released images and packages, only.

  3. Share your local internet connection: Freifunk and other networks are built with the idea of having one or many internet gateways providing connectivity. Unfortunately, many networks depend on an external infrastructure (i.e. a fixed vpn provider). For node-config, the need for a designated vpn provider has to be eliminated. Node-config must allow direct
    internet sharing or using an arbitrary vpn provider.

  4. Wifi & OpenWRT configuration is all you need to know:
    Introducing servers or operating vpn providers requires additional knowledge in a community. For instance, supernodes require management using ansible or puppet, connecting to autonomous systems (AS) (i.e. Freifunk Rheinland e.V.) via vpn puts the Border Gateway Protocol (BGP) on the table. In its core, that knowledge is outside of running wifi mesh networks and raises the entrance barrier. Using node-config must not require any additional knowledge (except OpenWRT configuration and wifi).

  5. Overcoming scaling limitations seen in Gluon based networks.
    Many proposed
    Gluon design scale up to 1000 ~ 2000 nodes per mesh. Basically, this is due to broadcast / anycast and batman-adv management traffic sent to all nodes. Roaming management traffic
    is distributed across regions without any reasonable wifi coverage. Node-config is required to scale better by integrating hierarchical routing and to limit roaming management traffic to definable regions.

  6. Be decentralized - no administration authority. Node-config is desinged to implement decentralisation
    (c.f. https://en.wikipedia.org/wiki/Hacker_ethic[Hacker Ethic (wikipedia)]). It is required, that no
    administration authority governs the network. Node-config is required to:

  • Eliminate the need for network administrators (i.e. for servers or BGP routers)
  • Empower people by providing a simple and accessible design as well as documentation.
  • Lower the entrance barrier (funding, infrastructure, know-how) for new communities.
@christf
Copy link
Owner

christf commented Aug 29, 2019

thank you, yanosz for your input.

I agree, some goals seem to align, for some you have found more a stricter definition, ie the first one. I agree this is desirable and am approaching it from the current state instead for starting a different topology altogether. For (1) I'd say it is just a matter of implementation. In the end we won't have to rely on central servers. Interestingly when going that far, a few things need to be in order. Finding other nodes could be a hard problem. How would you approach that?

For the two challenges, I would address the first one "scalability" as this:
rather than a function of memory usage, scalability is much more a function of bandwidth, round-trip-time and yes of course resource consumption (CPU/memory) too. Of course it is correct that a mac address is smaller than a host route. However:

  • batman needs to implement its own routing wheras for babeld, the routing is already implemented.
  • batman needs to implement its own path finding algorithms. With babeld there is no global view on the network, just a node and its neighbours. This is much more efficient.
  • well, you already mentioned traffic classes that can be problematic.
    I do not expect your assessment to turn out as correct because there are many more factors to be considered than just the length of the addresses.

I have not seen tests for batman with 10K clients. For babeld we have seen 50K routes on plastic routers (let's broadly put that in the ballpark of 10K clients - 3-4 routes per client, 1-2 per node). It seems to be consensus that a batman domain should not have more than a few hundred clients. This is a measurable improvement by factor 10. As such it is not a theoretical case any more. We have seen babeld to scale better albeit in a somewhat synthetical scenario.

The decentralization challenge is harder I think. What does it mean to run a network? In order for communication to happen a few rules of the road must be set. for person-to-person interaction we do this as well. One of those common rules is which language is to be used.
We will always need some authority to establish these. Beyond that every participant should be able to explore options. I agree it is best if no central infrastructure is required. the hardest problem to solve I think is the "how do I find mesh-vpn-partners"? If we can get that one tackled it would be a big step indeed.

@yanosz
Copy link
Author

yanosz commented Aug 29, 2019

thanks responding. I guess, that discussing theses issues is beyond the capabilities of github''s issue tracker. I guess, that this discussion will become readable in the next days, when different people bring their ideas to to table.

It was good, to have a discussion at last years WCW and I missed it, this year. A few things I have in my mind right now:

  • I'm not aware of any perfomance study dealing with this many batman-adv (50K) nodes. The numbers (1000 ~ 2000) are observations on a full protocol stack. It's not clear if the limit is exactly this way. It'd make a good bachelor thesis topic, though.
  • To my knowledge batman-adv doesn't have a full view on the network - its distance vector routing, still.
  • I don't think, that clear speration between L2 and L3 makes sense anyway - e.g. when looking at openvswitch capabilities its switching ./. routing on l2 and l3 combined.
  • I think that batman-adv can scale as good as babel when clever mgmt. is in place.

@yanosz
Copy link
Author

yanosz commented Aug 30, 2019

small update I forgot to make this explicit:

batman-adv and babel + l3roamd share a lot of characeteristics (pro-active routing, distance-vector, beaconing) - propagating topology throughout the network. In both cases, roaming related information is propagated through the whole topology - this holds for host routes as well as TT-tables,. This is clumsy: it's impossible to roam between arbitrary nodes - but propagating this information is one part of scaling issue; the other part is that proactive routing generates noise by design.

There are two rather obvious ways out this:

  • Using a reactive protocol: this results in nodes not having information on others they don't communicate with.
  • Strict splitting (e.g. batman-adv - babel dual stack, that's what node-config does).

To my knowledge, there is not much work put into this. Afaik no Freifunk community has - for instance- tried modifying 802.11s to cover a full network. Strict splitting might result in big topologies, that are interconnected, still. To my knowledge, no community tried this (e.g- using node-config) on a larger scale.

From this perspective babeld + l3roamd and batman-adv a quite alike and will share their downsides.

@christf
Copy link
Owner

christf commented Aug 31, 2019

yeah, now that we have 802.11s I also thought about doing 802.11s locally and to connect these mesh clouds over babel (and possibly skipping net-wide roaming altogether). On the other hand it is also important to finish one experiment before starting another.

l3roamd + babeld is much more flexible than batman. We can define the scope of roaming for example. We could make it more reactive. I just chose not to do so at first because Batman is setting the gold standard for now - the alternative must feel similar. And of course the ability to use different VPN protocols apart from fastd is a big benefit of routing, next to routing itself being used in significantly more installations than batman. I have not seen a kernel dump core due to routing issues in the last 20 years. I have seen kernel panics with batman (I don't blame the batman devs, they are doing tremendous work. It is just that batman is really multiple software projects meshed into one and the resulting complexity drives bugs).

I do not agree with your assertion that they are alike and will share their downsides. Maybe when just looking at route distribution. So if that is the problem, let switch out the protocol. Use BMX! Or OLSR! That flexibility is important when building experimental networks. If we don't have it, we cannot run experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants