Skip to content

Data Standards for Inputs and Outputs

Kasia Kozlowska edited this page Oct 16, 2020 · 13 revisions

This page goes through:

  1. What is achievable through GeNet
  2. Supported data types and any assumed or required pre-processing
  3. Supported and expected quality assurance for output data

1. What is achievable through GeNet?

GeNet is used for working with multimodal (i.e. including public transport (PT)) networks. It is based on MATSim transport networks. A full list of what can be performed with GeNet, and details of how to perform it, can be found in Functionality and Usage Guide. As an overview, GeNet is used for:

  1. In-memory representation of multi-modal network with a PT service for easy inspection and visualisation
  2. Generation of auxiliary files e.g. Road Pricing for MATSim
  3. Modification to generate network scenarios or fixing a network; GeNet provides a changelog for change tracking
  4. Validation

GeNet does not currently support relating a genet.Schedule that has been read from GTFS to a genet.Network. Which means

  • you cannot create a multimodal network using GeNet
  • you cannot add any new services to an existing Network if they are to use the network graph (you can only do this by adding artificial links to the graph yourself)

2. Supported data types and any assumed or required pre-processing

Right now there are three ways of reading data into GeNet

1. Using MATSim's network data files

Small samples of this data can be found in tests/test_data/matsim.

network.xml

This file which defines the graph of the transport network, all infrastructure available to agents or vehicles.

Assumptions

  • The input network.xml satisfies the following network v2 dtd schema.
  • No pre-processing required
  • It has to be a valid network.xml file according to the dtd but not a necessarily a valid MATSim network.xml file. GeNet supports a number of known issues such as:
    • graph connectivity issues (these can be analysed and validated using GeNet)
    • duplicate ids
      • duplicate link ids which will result in a new index,
      • duplicate node ids will be flagged but only one will persist to the genet.Network object

It is not guaranteed that a specific and undiagnosed issue with your network will be read into GeNet out of the box in the way you expect or at all.

schedule.xml

This file defines all of the PT stops and PT services that use them, each service has a number of different routes that make several different trips during the day.

Assumptions

  • The input schedule.xml satisfies the following schedule v2 dtd schema.
  • No pre-processing required
  • It has to be a valid schedule.xml file according to the dtd but not a necessarily a valid MATSim schedule.xml file. GeNet supports a number of known issues such as:
    • transit routes with loops
    • arrival and departure offset issues
    • network routing for PT services such as
      • missing routes
      • traversability:
        • connectedness of links in routes for transit
        • matching mode being stored in links used for the route

It is not guaranteed that a specific and undiagnosed issue with your schedule will be read into GeNet out of the box in the way you expect or at all.

Other remarks

A network can exist within GeNet without a schedule and there are operations which can be performed just on the graph. It is recommended however that a schedule is read with the network file if working with a network which has transit associated with it. This is because changes to the graph can affect routing for transit services and for validation purposes.

GeNet does not currently read the vehicles.xml which defines all of the PT vehicles that making the trips given in the schedule. (LAB-583)

2. Using OSM data to create a graph for a genet.Network

Small samples of this data can be found in tests/test_data/osm.

GeNet ingests OSM data with .osm or .osm.pbf extensions, which can be obtained from

  • http://download.geofabrik.de/ for well-defined regions/countries
  • through JOSM for small subsets and saved to the aforementioned extensions In /configs you will find example configs which can be used as a starting point.

GeNet can ingest only one OSM file and one config describing how to read that OSM file at a time. It is possible to generate a more complicated network which is composed of areas which require different OSM way tags to be read. The current workflow in GeNet is to

  • collect your OSM files
  • produce a config file for each OSM file
  • generate several genet.Network objects and add them together

GeNet will assign values to links based on their OSM tags. These have been extracted form MATSim's JOSM add-on and can be found in /outputs_handler/matsim_xml_values.py. There are currently two caveats:

  • The number of lanes in a link, i.e. permlanes is taken from OSM data if present (and read by config under OSM_TAGS: USEFUL_TAGS_PATH and OSM_TAGS: PUMA_GRAPH_TAGS) and integer convertible (e.g. not '2;3')
  • capacity given in matsim_xml_values.py is for a single lane, the output network will feature a capacity value which is the result of permlanes * capacity, where the latter capacity is the base lane capacity defined in matsim_xml_values.py

3. Using GTFS data to create a genet.Schedule

Small sample of this data can be found in tests/test_data/gtfs.

GeNet ingests zipped or unzipped GTFS feeds. The following files are required in the unzipped folder, or inside the zip file:

  • calendar.txt
  • stop_times.txt
  • stops.txt
  • trips.txt
  • routes.txt

When reading a GTFS feed, GeNet expects a date in YYYYMMDD format. It will raise an error if the selected date yields no services.

GeNet currently does not support filtering the output genet.Schedule objects based on a geographical area. This can be done using gtfs-lib prior to ingestion in GeNet. Or, you can attempt to manipulate the genet.Schedule object within GeNet. Make sure to validate the final genet.Schedule.

GeNet currently only supports merging of separable gtfs feeds, meaning there are no services in common.

The user assumes responsibility for the quality of their input GTFS feed. There are various validation tools that can be used with GTFS feeds before PUMA process is ran, see this page for a summary of validation tools.

3. Supported and expected quality assurance for output data

GeNet provides methods to validate the graph of the genet.Network, the validity of the genet.Schedule in relation to the behaviour of all of the transit routes within in, and the relationship between genet.Schedule and genet.Network through the network routes given in the schedule. These checks should be ran and investigated thoroughly, throughout, and at the end of working with a network in GeNet before saving it out to file.

Supported output files

GeNet currently supports writing MATSim network files:

  • network.xml with the following network v2 dtd schema

    Data present on the nodes and edges of the graph will only persist to the network.xml file if it matches the required or optional attributes defined in the /variables.py, or is saved in a nested dictionary under attributes for network links in the following format: 'attributes': {'attribute_name' : {'name': 'attribute_name', 'class': 'java.lang.String', 'text': 'attribute_value'}}.

  • schedule.xml with the following schedule v2 dtd schema

    Similarly to the network, in the case any data is added to the genet.Schedule object's graph, only the allowed attributes for stops (graph nodes) defined in /variables.py will persist to the schedule.xml file.

  • vehicles.xml with the following vehicles v1 dtd schema

    GeNet will generate new vehicle.xml file using values for vehicles given in matsim_xml_values/py (see LAB-583 for progress on supporting vehicle.xml files)

During the MATSim export, GeNet will also generate the following files

  • nodes.geojson all of the data stored under nodes of the genet.Network graph in geojson format, supported by kepler.gl and geopandas
  • nodes_geometry_only.geojson the x and y coordinates in Point format stored under nodes of the genet.Network graph in geojson format, supported by kepler.gl and geopandas. This is useful for visualising large networks.
  • links.geojson all of the data stored under nodes of the genet.Network graph in geojson format, supported by kepler.gl and geopandas
  • links_geometry_only.geojson the x and y coordinates in LineString format stored under edges of the genet.Network graph in geojson format, supported by kepler.gl and geopandas. This is useful for visualising large networks.
  • change_log.csv tabular account of changes performed during GeNet session. Changes are recorded only when using GeNet's own operations.

Expected validation

If planning to run the network with MATSim the user is expected to perform the following validation steps on the output network:

  1. Run GeNet's validation report and inspect the results for graph, schedule and routing, before saving the network.
  2. Run PT2MATSim's validation process (see Check Mapped Schedule Plausibility)
  3. Visualise all modal subgraphs (automated process to generate these outputs coming soon LAB-584) and inspect for
    • any parts of the network missing
    • any connectivity issues
    • capacity values for road network
    • permlanes values for road network
    • any of the PT mode graphs looking odd, going where it shouldn't be present
  4. Visualise the schedule (automated process to generate these outputs in GeNet coming soon LAB-584) and inspect
    • values for trains per hour on the schedule graph's edges in AM (morning), IP (inter-peak) and PM (afternoon) peak with relation to the location of the PT terminals
    • values for trains per hour for the entire day for selected transit stops
  5. Finally run a small tester MATSim simulation with the output network for at least 2 iterations.