Simulator Design Summary

The simulator is event-driven (single threaded) and operates at the level of a large datacenter network topology. It takes the following inputs which are specified through a JSON input file:

Topology: the topology type (e.g. FB Fabric) along with its parameters.
Link failure trace: a trace containing link failure events where each event is denoted as: <time>, <link id>, <loss rate>. For example: 349200,6136,6.5e-05 denotes that at time 349200, link ID 6136 in the network topology started corrupting packets with a loss rate of 6.5e-05.
Solution: The solution could either be CorrOpt which is an algorithm to disable a subset of the failed links or it could be the joint strategy of LinkGuardian + CorrOpt as proposed in our paper (section 3.6). Any parameters corresponding to the solution are also required as the input; most importantly, the "capacity constraint" as per which the solution needs to operate.

The simulator then outputs a timeseries of several topology-level performance parameters, most important of which are the following:

Total penalty: sum of the loss rates for all the active (remaining) corrupting links in the network.
Least paths per ToR: the least fraction of paths to the spine (top) layer of the network for the worst-case ToR. This metric captures the impact on per-ToR path diversity as corrupting links are disabled for repair.
Least capacity per pod: the total capacity in a network pod from the ToR-layer to the spine (top) layer for the worst-case pod in the network.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design.md

design.md

Simulator Design Summary

Files

design.md

Latest commit

History

design.md

File metadata and controls

Simulator Design Summary