Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and enforce mesh_metric #969

Open
3 tasks
spolack opened this issue Sep 17, 2024 · 4 comments
Open
3 tasks

Document and enforce mesh_metric #969

spolack opened this issue Sep 17, 2024 · 4 comments

Comments

@spolack
Copy link
Member

spolack commented Sep 17, 2024

We want to start routing IPv4 via Babel soonish. Lets make sure that the metrics are in the desired state to have a deterministic routing experience :)

  • Agree on metrics concept
  • Document metrics
  • Ensure metrics are set over most criticial Sites
@spolack
Copy link
Member Author

spolack commented Sep 17, 2024

RFC 0.1

Metric >=Speed Remarks
64-255  5 Gbit/s For future use
256-2047 1 Gbit/s 60Ghz
2048-4095 100 Mbit/s 5GHz
4096-8191 50 Mbit/s 5GHz PtMP
8192-16384 10Mbit/s 2GHz / Default

Metric: The first value should be tried first. In case of suboptimal routing the the range can be used as wiggle-room for individual traffic engineering.

By applying this scheme we would prefer:

  • 4 5GBit/s links over single 1Gbit Link
  • 8 1Gbit/s links over single 100Mbit/s link
  • ...

Open question is where to put VPN. I'd prefer somewhere between 4096-8191.
Also it should be taken in account that the max metric accross the whole path is uint16 (65535)

@Noki
Copy link
Member

Noki commented Sep 19, 2024

I think we should be a bit more granular especially in the area between 1 Gbit/s and 100 Mbit/s so we can set values that match the speed of the connections more closely e.g. by measuring it with iperf3 or looking at the capacities reported by UISP. In my opinion this would help in areas where there are multiple 5 GHz paths with a lot of variance in bandwidth. I know that this was meant with "wiggle room" but I would prefer a more detailed table as it leaves less room for interpretation. In addition I think it is also a good idea to have 2.5 Gbit/s as we already have it with the airFiber 60 Xtreme-Range (ak36<->teufelsberg), eventhough the cable connection/core-router is currently the limit there.

My proposal would be as follows:

Metric Max Speed Min Speed > Remarks
64 - 127 10 Gbit/s 5 Gbit/s Fiber
128 - 255 5 Gbit/s 2.5 Gbit/s Fiber
256 - 511 2.5 Gbit/s 1 Gbit/s 60 GHz wireless/ethernet
512 - 1023 1 Gbit/s 500 Mbit/s 60 GHz wireless/ethernet
1024 - 2047 500 Mbit/s 250 Mbit/s 5 GHz wireless
2048 - 3071 250 Mbit/s 100 Mbit/s 5 GHz wireless
3072 - 4095 100 Mbit/s 50 Mbit/s 5 GHz wireless
4096 - 6143 50 Mbit/s 25 Mbit/s 5 GHz / 2 GHz wireless
6144 - 8191 25 Mbit/s 10 Mbit/s 5 GHz / 2 GHz wireless
8192 - 12287 10 Mbit/s 5 Mbit/s 5 GHz / 2 GHz wireless
12288 - 16383 5 Mbit/s 1 Mbit/s 5 GHz / 2 GHz wireless
16384 - 32767 1 Mbit/s 0 Mbit/s Fallback, management link

A connection like emma<->rhnk that has a capacity of 273 Mbit/s would be in the metric range of 1024 - 2047, with a default metric of 1024 or with an adjusted metric set to something closer to 2047.

My metrics might need further adjustments, but I think you get the idea of the more complete table.

Regarding VPN / backup uplinks I suggest a variant where we take the connection bandwidth into account. My proposal here is to take the downlink connection bandwidth to determine the row from aboves table and then go two or maybe even 3 rows down. With two rows a 250 MBit/s private uplink would get 4096. A 100 MBit/s private uplink would get 6144 and so on.

@spolack
Copy link
Member Author

spolack commented Sep 20, 2024

Thinking of whether it would be nice to find a way to have metrics in a bidirectional way, as usually 5GHz strongly depends on the direction. And how to model the data for PtMP APs per individual Station. Also it would be beneficial to find a way, how to aritificially double the cost for traffic entering and leaving the location on the same interface. (For instance wilgu10-sama -> sama-sued-60ghz -> w38b). In that case the cost of the routes from w38b should be added twice at sama-sued-60ghz, because the possible bandwidth is only the half.

And a third thought is that we could have a helper function, either written as a jinja2 macro or attach a python lookup plugin to our playbooks, where we pass the estimated Bandwidth, instead of manually calculating the cost.
example:

mesh_metric: "{{ METRIC_FROM_BW(2000) }}"

@Noki
Copy link
Member

Noki commented Oct 31, 2024

I think having a function that translates bandwidth into a mesh_metric is a really good idea. Since most of the traffic is RX we can base everything on measured or estimated RX bandwidth between the node and the neighbor and also apply this to tunspace uplinks.

I can follow your argument with traffic entering and leaving the same location at the same interface and would love to see a solution eventhough I think we are also fine without one. In our current network topology (#1010 - contains a map which shows gateway selection) this kind of traffic is either totally valid (traffic within our network), or a cause of routing when we have outages and can't really avoid it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants