Skip to content

Data Standards for Inputs and Outputs

Kasia Kozlowska edited this page Oct 15, 2020 · 13 revisions

This page goes through:

  1. What is achievable through GeNet
  2. Supported data types and any assumed or required pre-processing
  3. Supported and expected quality assurance for output data

1. What is achievable through GeNet?

GeNet is used for working with multimodal (i.e. including public transport (PT)) networks. It is based on MATSim transport networks. A full list of what can be performed with GeNet, and details of how to perform it, can be found in Functionality and Usage Guide. As an overview, GeNet is used for:

  1. In-memory representation of multi-modal network with a PT service for easy inspection and visualisation
  2. Generation of auxiliary files e.g. Road Pricing for MATSim
  3. Modification to generate network scenarios or fixing a network; GeNet provides a changelog for change tracking
  4. Validation

GeNet does not currently support relating a genet.Schedule that has been read from GTFS to a genet.Network. Which means

  • you cannot create a multimodal network using GeNet
  • you cannot add any new services to an existing Network if they are to use the network graph (you can only do this by adding artificial links to the graph yourself)

2. Supported data types and any assumed or required pre-processing

Right now there are three ways of reading data into GeNet

1. Using MATSim's network data files

Small samples of this data can be found in tests/test_data/matsim.

network.xml

This file which defines the graph of the transport network, all infrastructure available to agents or vehicles.

Assumptions

  • The input network.xml satisfies the following network v2 dtd schema.
  • No pre-processing required
  • It has to be a valid network.xml file according to the dtd but not a necessarily a valid MATSim network.xml file. GeNet supports a number of known issues such as:
    • graph connectivity issues (these can be analysed and validated using GeNet)
    • duplicate ids
      • duplicate link ids which will result in a new index,
      • duplicate node ids will be flagged but only one will persist to the genet.Network object

It is not guaranteed that a specific and undiagnosed issue with your network will be read into GeNet out of the box in the way you expect or at all.

schedule.xml

This file defines all of the PT stops and PT services that use them, each service has a number of different routes that make several different trips during the day.

Assumptions

  • The input schedule.xml satisfies the following schedule v2 dtd schema.
  • No pre-processing required
  • It has to be a valid schedule.xml file according to the dtd but not a necessarily a valid MATSim schedule.xml file. GeNet supports a number of known issues such as:
    • transit routes with loops
    • arrival and departure offset issues
    • network routing for PT services such as
      • missing routes
      • traversability:
        • connectedness of links in routes for transit
        • matching mode being stored in links used for the route

It is not guaranteed that a specific and undiagnosed issue with your schedule will be read into GeNet out of the box in the way you expect or at all.

Other remarks

A network can exist within GeNet without a schedule and there are operations which can be performed just on the graph. It is recommended however that a schedule is read with the network file if working with a network which has transit associated with it. This is because changes to the graph can affect routing for transit services and for validation purposes.

GeNet does not currently read the vehicles.xml which defines all of the PT vehicles that making the trips given in the schedule. (LAB-583)

2. Using OSM data to create a graph for a genet.Network

Small samples of this data can be found in tests/test_data/osm.

GeNet ingests OSM data with .osm or .osm.pbf extensions, which can be obtained from

  • http://download.geofabrik.de/ for well-defined regions/countries
  • through JOSM for small subsets and saved to the aforementioned extensions In /configs you will find example configs which can be used as a starting point.

GeNet can ingest only one OSM file and one config describing how to read that OSM file at a time. It is possible to generate a more complicated network which is composed of areas which require different OSM way tags to be read. The current workflow in GeNet is to

  • collect your OSM files
  • produce a config file for each OSM file
  • generate several genet.Network objects and add them together

3. Using GTFS data to create a genet.Schedule

Small sample of this data can be found in tests/test_data/gtfs.

GeNet ingests zipped or unzipped GTFS feeds. The following files are required in the unzipped folder, or inside the zip file:

  • calendar.txt
  • stop_times.txt
  • stops.txt
  • trips.txt
  • routes.txt

When reading a GTFS feed, GeNet expects a date in YYYYMMDD format. It will raise an error if the selected date yields no services.

GeNet currently does not support filtering the output genet.Schedule objects based on a geographical area. This can be done using gtfs-lib prior to ingestion in GeNet. Or, you can attempt to manipulate the genet.Schedule object within GeNet. Make sure to validate the final genet.Schedule.

GeNet currently only supports merging of separable gtfs feeds, meaning there are no services in common.

The user assumes responsibility for the quality of their input GTFS feed. There are various validation tools that can be used with GTFS feeds before PUMA process is ran, see this page for a summary of validation tools.

3. Supported and expected quality assurance for output data

GeNet provides methods to validate the graph of the genet.Network, the validity of the genet.Schedule in relation to the behaviour of all of the transit routes within in, and the relationship between genet.Schedule and genet.Network through the network routes given in the schedule. These checks should be ran and investigated thoroughly, throughout, and at the end of working with a network in GeNet before saving it out to file.

GeNet currently supports writing MATSim network files: