-
Notifications
You must be signed in to change notification settings - Fork 10
Data Standards for Inputs and Outputs
This page goes through:
- What is achievable through GeNet
- Supported data types and any assumed or required pre-processing
- Supported and expected quality assurance for output data
GeNet is used for working with multimodal (i.e. including public transport (PT)) networks. It is based on MATSim transport networks. A full list of what can be performed with GeNet, and details of how to perform it, can be found in Functionality and Usage Guide. As an overview, GeNet is used for:
- In-memory representation of multi-modal network with a PT service for easy inspection and visualisation
- Generation of auxiliary files e.g. Road Pricing for MATSim
- Modification to generate network scenarios or fixing a network; GeNet provides a changelog for change tracking
- Validation
GeNet does not currently support relating a genet.Schedule
that has been read from GTFS to a genet.Network
.
Which means:
- you cannot create a multimodal network using GeNet
- you cannot add any new services to an existing Network if they are to use the network graph (you can only do this by adding artificial links to the graph yourself)
Right now there are three ways of reading data into GeNet
Small samples of this data can be found in tests/test_data/matsim.
This file defines the graph of the transport network, all infrastructure available to agents or vehicles.
- The input
network.xml
satisfies the following network v2 dtd schema. GeNet does not explicitly validatenetwork.xml
against the DTD, but there is an outstanding issue to add this functionality - No pre-processing required, i.e. the network should be ready to use and amend as is
- It has to be a valid
network.xml
file according to the DTD, but not necessarily a valid MATSimnetwork.xml
file. Files that conform to the DTD schema can still cause errors or unexpected behaviour when used as the input to a MATSim simulation. GeNet detects a subset of known issues such as:- graph connectivity issues (these can be analysed and validated using GeNet)
- duplicate ids
- duplicate link ids (which result in the creation of a new index for each duplication of a link id)
- duplicate node ids (which are flagged and de-duped, so only one of a set of duplicated nodes will persist to the genet.Network object and the others are dropped. The first encountered node is kept. If flagged, the user is required to investigate the links connected to affected node(s) as they carry spatial information which is inherited by network links.
It is not guaranteed that a specific and undiagnosed issue with your network will be read into GeNet out of the box in the way you expect or at all.
This file defines all of the PT stops and PT services that use them; each service has a number of different routes that make several different trips during the day.
- The input
schedule.xml
satisfies the following schedule v2 dtd schema. GeNet does not explicitly validateschedule.xml
against the DTD, but there is an outstanding issue to add this functionality - No pre-processing required
- It has to be a valid
schedule.xml
file according to the dtd but not necessarily a valid MATSimschedule.xml
file. Files that conform to the DTD schema can still cause errors or unexpected behaviour when used as the input to a MATSim simulation. GeNet detects a number of known issues such as:- transit routes with loops
- arrival and departure offset issues
- network routing for PT services such as
- missing routes
- traversability:
- connectedness of links in routes for transit
- matching mode being stored in links used for the route
It is not guaranteed that a specific and undiagnosed issue with your schedule will be read into GeNet out of the box in the way you expect or at all.
A network can exist within GeNet without a schedule and there are operations which can be performed just on the graph. It is recommended however that a schedule is read with the network file when working with a network which has transit associated with it. This is because changes to the graph can affect routing for transit services and for validation purposes.
GeNet does not currently read the vehicles.xml
which defines all of the PT vehicles that make the trips given in
the schedule. (LAB-583,
Issue 26 and Issue 18).
Small samples of this data can be found in tests/test_data/osm.
GeNet ingests OSM data with .osm or .osm.pbf extensions, which can be obtained from
- http://download.geofabrik.de/ for well-defined regions/countries
- through JOSM for small subsets and saved to the aforementioned extensions In /configs you will find example configs which can be used as a starting point.
GeNet can ingest only one OSM file and one config describing how to read that OSM file at a time. It is possible to generate a more complicated network which is composed of areas which require different OSM way tags to be read. The current workflow in GeNet is to:
- collect your OSM files
- produce a config file for each OSM file
- generate several
genet.Network
objects and add them together
GeNet will assign values to links based on their OSM tags. These have been extracted form MATSim's JOSM add-on and can be found in /outputs_handler/matsim_xml_values.py. There are currently two caveats:
- The number of lanes in a link, i.e.
permlanes
is taken from OSM data if present (and read by config underOSM_TAGS: USEFUL_TAGS_PATH
) and integer convertible (e.g. not'2;3'
) -
capacity
given inmatsim_xml_values.py
is for a single lane, the output network will feature acapacity
value which is the result ofpermlanes * capacity
, where the lattercapacity
is the base lane capacity defined inmatsim_xml_values.py
GeNet creates a Multi Directed Graph from OSM data, meaning that there can be more than one edge between the same pair of nodes directed from origin_node
to destination_node
, if that is the case within the OSM file.
Small sample of this data can be found in tests/test_data/gtfs.
GeNet ingests zipped or unzipped GTFS feeds. The following files are required in the unzipped folder, or inside the zip file:
- calendar.txt
- stop_times.txt
- stops.txt
- trips.txt
- routes.txt
When reading a GTFS feed, GeNet expects a date in YYYYMMDD
format. It will raise an error if the selected date yields no services.
GeNet currently does not support filtering the output genet.Schedule
objects based on a geographical area. This can
be done using gtfs-lib prior to ingestion in GeNet. Or, you can attempt to
manipulate the genet.Schedule
object within GeNet. Make sure to validate the final genet.Schedule
.
GeNet currently only supports merging of separable gtfs feeds, meaning there are no services in common.
The user assumes responsibility for the quality of their input GTFS feed. There are various validation tools that can be used with GTFS feeds before being used in GeNet, see this page for a summary of validation tools.
GeNet provides methods to validate the graph of the genet.Network
, the validity of the genet.Schedule
in relation
to the behaviour of all of the transit routes within in, and the relationship between genet.Schedule
and genet.Network
through the network routes given in the schedule. These checks should be run and investigated
thoroughly, throughout, and at the end of working with a network in GeNet before saving it out to file.
GeNet currently supports writing MATSim network files:
-
network.xml
with the following network v2 dtd schemaData present on the nodes and edges of the graph will only persist to the
network.xml
file if it matches the required or optional attributes defined in the /variables.py, or is saved in a nested dictionary underattributes
for network links in the following format:'attributes': {'attribute_name' : {'name': 'attribute_name', 'class': 'java.lang.String', 'text': 'attribute_value'}}
. -
schedule.xml
with the following schedule v2 dtd schemaSimilarly to the network, in the case any data is added to the
genet.Schedule
object's graph, only the allowed attributes for stops (graph nodes) defined in /variables.py will persist to theschedule.xml
file. -
vehicles.xml
with the following vehicles v1 dtd schemaGeNet will generate new
vehicle.xml
file using values for vehicles given in matsim_xml_values/py (see LAB-583 for progress on supportingvehicle.xml
files)
During the MATSim export, GeNet will also generate:
-
change_log.csv
tabular account of changes performed during GeNet session. Changes are recorded only when using GeNet's own operations.
The following files can be generated using save_network_to_geojson
method:
-
nodes.geojson
all of the data stored under nodes of thegenet.Network
graph ingeojson
format, supported by kepler.gl and geopandas -
nodes_geometry_only.geojson
thex
andy
coordinates inPoint
format stored under nodes of thegenet.Network
graph ingeojson
format, supported by kepler.gl and geopandas. This is useful for visualising large networks. -
links.geojson
all of the data stored under nodes of thegenet.Network
graph ingeojson
format, supported by kepler.gl and geopandas -
links_geometry_only.geojson
thex
andy
coordinates inLineString
format stored under edges of thegenet.Network
graph ingeojson
format, supported by kepler.gl and geopandas. This is useful for visualising large networks.
If planning to run the network with MATSim the user is expected to perform the following validation steps on the output network:
- Run GeNet's validation report and inspect the results for graph, schedule and routing, before saving the network.
- Run PT2MATSim's validation process (see Check Mapped Schedule Plausibility)
- Visualise all modal subgraphs (automated generation of these outputs in GeNet can be done using the
generate_standard_outputs
method) and inspect for- any parts of the network missing
- any connectivity issues
-
capacity
values for road network -
permlanes
values for road network - any of the PT mode graphs looking odd, going where it shouldn't be present
- Visualise the schedule (automated generation of these outputs in GeNet can be done using the
generate_standard_outputs
method) and inspect- values for trains per hour on the schedule graph's edges in AM (morning), IP (inter-peak) and PM (afternoon) peak with relation to the location of the PT terminals
- values for trains per hour for the entire day for selected transit stops or services
- number of trips per day per service or stop pair (stop pairs are more useful in case of rail for example)
- Finally run a small tester MATSim simulation with the output network for at least 2 iterations.