-
Notifications
You must be signed in to change notification settings - Fork 1
cwacek/inettopology.popmap
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Author: Chris Wacek Email: [email protected] Date: 16 Jan, 2013 Inet Topology Tools =================== This repository contains a set of tools used to construct network topology graphs from traceroute data. Traceroutes from a number of vantage points are combined to create a connected graph, where each vertex is a point of presence connected to other points of presence via links present in trace routes. Points of presence can be thought to represent access points on the internet. They consist of one or more IP addresses present in the trace routes. IP addresses are joined to a point of presence if the distance between them is very small (< 2.5ms) and they are in the same AS and /24. Installation/Requirements ------------------------- The tools are almost entirely written in Python. They are located in the bin directory and have a setup script for installing them. As an aside, if you don't work with Python often, I highly recommend installing `virtualenv`. There's a good explanation of it [here]( http://iamzed.com/2009/05/07/a-primer-on-virtualenv/ ). To install simply run `python setup.py install` inside the bin/ directory. The scripts, all of which begin with 'inet\_graph' will then be usable from your path. ### Dependencies 1. **Redis** This package consists of many different scripts which act in concert to produce a network graph from raw traceroute data. In order to make this work (and save a ton of time reading and writing from disk), the code all uses the [Redis]( http://redis.io/ ) key-value store as a backend. Redis must be installed and running on whatever machine this is run on. Additionally, the scripts currently just default to using database 0 on the Redis server. If there is an existing Redis server operating, this setting may need to be changed. 2. **PyPy (optional)** Some of the data processing scripts can take a long time to run. They are much faster when run using the PyPy interpreter instead of the standard Python interpreter. You can find more information about, and obtain PyPy, at the [PyPy site]( http://pypy.org/ ). Unfortunately, some of the scripts **do not support** PyPy because they use libraries that do not work with PyPy. `lxml` is a big culprit here. As a result, while I recommend you use PyPy for some steps, you do not have to if the idea of switching interpreters makes you lose sleep. 3. **Assorted Python Libraries** The scripts use a variety of Python libraries. Rather than specify them as a global set of dependencies however, each of the scripts is set up to warn if a dependency is not met. The Tools --------- There are a number of different contained in the package, which perform different steps of the process. This section provides a brief overview of each of them. ### `inet_graph_process_data` This script does the hard work of processing traceroutes into the Redis database and then identifying and joining IP addresses into points of presence. It has a number of subcommands: - `dump_ips` Dumps the unique set of IP addresses present in a traceroute file. - `load_IP_data` Before points of presence can be determined information about IP<->AS relationships is needed. This command is designed to load that data into the database so it can be used later. - `parse` Parses a traceroute file into the database. This process involves a certain amount of normalization, such as removing negative distances and handling missing steps in a traceroute. The specifics of normalization are found in TraceParser.py This process can be run multiple times in parallel, although there are diminishing returns due to IO requirements and the ability of the Redis backend to handle multiple connections. 16 at a time is pretty effective though. - `assign_pops` Assigns every IP address in the database to a point of presence. This can also be run in parallel, although there are situations where deadlocks occur, and those have to be deferred when run in parallel. As a result, after parallel processing is complete it must be run one more time with the --process_failed flag. - `process_joins` Joins any points of presence that were assigned separately but which realistically should be the same point of presence. ### `inet_graph_load_as_relationships` The sole purpose of this script is to parse the CAIDA [*Inferred AS Relationships Dataset*]( http://http://www.caida.org/data/active/as-relationships/index.xml ) into the Redis database for use with `inet_graph_create`. ### `inet_graph_create` Utility to produce a GraphML file from traceroute data that has been loaded into a local Redis database by `inet_graph_process_data`. This script performs two actions sequentially: 1. Loading the network graph from the Redis database and attaching Tor relays, clients and destinations to it based on the data provided. This step also performs some slight trimming of the graph by removing unconnected components, trimming degree-1 vertices (dangling edges of the tree), and collapsing degree-two nodes where it can be done without losing data. This intermediate graph representation is saved to disk as an GraphML file. This is important for repeatability, since this first step can take quite a while to perform. 2. Reducing the size of the network graph. This step can load a previously created intermediate representation using the option or it can take place immediately following step 1. At a high level, the network graph is reduced by removing all links that are not on a shortest path between two points of interest. In this case, points of interest are defined as Tor relays, clients, and destinations. If AS peering data is available, then each path is checked for valley-freeness, and if it is not valley-free, then a modified shortest-path algorithm is used to find the shortest valley-free path. The result of this process is written out as a GraphML file once complete. ### `inet_graph_aspath` A tool that has nothing to do with constructing the network map, but given the resulting output files can be used to determine the AS path between two IP addresses within the network.
About
Tools for constructing latency-focused network topologies from traceroute data.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published