Skip to content

Road Network features in GeoDa

Xun Li edited this page Jan 24, 2019 · 44 revisions

The Roads/Network features in GeoDa is under development of branch dev1.9_osm

octocat

1. Basic Road/Network support in GeoDa

In this release (dev1.9_osm), the main purpose is to allow GeoDa to read/write, visualize and analyze road/network dataset. The Road/Network dataset is a collection of LineString or MultiLineString type spatial objects.

Please see ESRI doc for a definition of linestring and multilinestring

1.1 Load and visualize Road/Network dataset.

To load a road/network dataset, one just drag-n-drop related file to GeoDa. For example: one can drag and drop the Chicago_road.shp (Chicago’s road/network data downloaded from OpenStreetMap and in the format of ESRI shapefile, shown in the screenshot below) to GeoDa’s Connect to Datasource dialog.

Once you load the file in GeoDa, the roads/network will be rendered in a map window. With proper projection information (e.g. a .prj file), one can add a basemap layer (e.g. Carto light) and the road/network will be projected onto selected basemap layer.

All map operations (e.g. zoom, pan, select etc.), basic file operations (e.g. save, save-as, save-selected-as etc.), and table operations (e.g. spatial join, selection tool etc.) in GeoDa work on road/network data.

Exploratory analysis in GeoDa can be applied on road/network data with tabular data. GeoDa's brushing and linking provides an exploratory approach to analyze the road/network data. For example, one can create an unique value map using “highway” field in Chicago_road table data. Or, in the screenshot below, a percentile map is created using the "length" field in Manhattan’s roads dataset, and the roads with length <1% and 1%-10% are highlighted by shift-clicking on the legends <1% and 1%-10% on the left side of the map window.

1.2 Add a road/network dataset as a new layer & Spatial join

Road/Network dataset can be added as a new layer in GeoDa for visualization and analysis purpose. For example, by click the “+” button on the toolbar in a map window, one can add a road/network dataset. GeoDa will automatically project the road/network to the working layer (the first map/dataset loaded in GeoDa) if the projections are different bewteen the layers.

A working layer is the first map or dataset you loaded in GeoDa. It is called working layer, because only the tabular data of working layer can be used to do exploratory data analysis and spatial regression in GeoDa. All other layers loaded later in GeoDa are used for visualization and spatial join only.

octocat

For spatial analysis, one can add a road layer and an area(polygon) data in GeoDa. The polygon data added as a second layer can be used as a reference layer for visualization purpose. Moreover, by applying spatial join on the two layers, one can compute (1) how many roads are in each zip code area, and (2) which road belongs to which area(polygon).

1.2.1 Spatial Assigning

For example, with the Chicago_road.shp file loaded in GeoDa as a working layer, one can load the Chicago community boundary data (polygons) as a second layer. By using Table->Sptial Join, one can select the loaded boundary data and run spatial join to compute which road belongs to which community area in Chicago. The results will be saved as a new column in Chicago_road tabular data. The values in this column represent the id of community area in Chicago.

octocat

1.2.2 Spatial Counting

If the Chicago community boundary data is the working layer, and Chicago_road.shp file is loaded as a second layer. By using the same Table->Spatial Join feature, one can compute how many roads are within each community area. The results will be saved as a new column in Chicago community boundary data. The integer values in this column indicate the counting of roads in each community area.

octocat

1.3 Save road/network dataset.

At most time, the raw road/network datasets are fetched from OpenStreetMap or Tiger/lines. Those dataset may not fit perfectly into your research problem or are simply not aligned with your existing dataset. Using GeoDa, it becomes easy to visualize the raw road/network data. One can do simple query operations on road/network dataset in GeoDa to filter the raw dataset and save the query results to a new dataset for further analysis.

For example, one can select the "residential" type roads, or select the roads that only in certain communities. Then, use menu "File->Save Selected As", one can save selected roads into a new file.

octocat

2. Create Weights from Road/Network data

Unlike creating weights using point/polygon dataset, when user clicks “Create” button in “Weights Manager”, there will be no "Weights Creating dialog". Instead, a message box will be popup to inform users that GeoDa will create weights using the basic topology of the roads/network. Click “Yes” button, a gal weights will be created.

octocat

Each road, which is either a linestring or multi-linestring, might consists of several line segments (see above figure of LineString/MultiLineString definition). When create a weights file, GeoDa will detect if two roads are connected or intersected via at least one shared vertex, which guarantees the accessibility of the two roads. If they are connected, GeoDa will treat these two roads as neighbors.

The accessibility is important when create spatial weights for roads/network data. In most road network data from OpenStreetMap, two roads might be very close but do NOT share any vertex(node). For example, as shown in the figure below, there are one east-bound highway segment (blue) and a west-bound (red) highway segment in a same area. In this case, the distance between the two roads can not be used to determine the neighborhood relationship. Another example is a bridge cross-over a perpendicular road underneath. The bridge intersects the underneath road visually, however they don't share a vertex at the cross location and therefore can not be treated as "neighbors".

octocat octocat

3. Snap Points to Road Network.

This feature is a special version of “Spatial Join” with a case of computing how many points (events) occurred on each road. In detail, each point will be computed close to which road, and it will be snapped to a road it is close to, then counting how many points are “snapped” on each road. This is useful when analyze events, like car accidents, or crimes, that happen on roads.

For example, the "man_points.shp" dataset is a collection of car accidents happened in Manhattan city during a certain time period. In GeoDa, we can load the "man_roads.shp" dataset as working layer, and "man_points.shp" as a second layer. One can use "Menu->Tools->OpenStreetMap->Snap Points to Roads", to compute how many car accident events were on each road. The results are saved in a new column "SC". One can use this new column to create an "Equal Interval" map (see screenshot) to discover spatial patterns of the car accidents.

octocat

4. Download OSM roads from OpenStreetMap

This will allow GeoDa users to download OSM road networks for a given area, and GeoDa can save the data into a ESRI shapefile that can be consumed by GeoDa and other GIS software. GeoDa uses OSM overpass api to fetch data. The implementation (fetching and parsing) is using C/C++, so it is faster than some python libraries. This feature will be helpful for users who don’t have road/network dataset in hands. We can extend more features like download administrative area data from OpenStreetMap.

For example, one can load their existing data (e.g. Chicago community boundary data), and then access “Menu->Tools->OpenStreetMap->Download OSM data”. In the popup dialog, the “Bounding box” in the popup dialog will be filled with values that extracted from current working layer. After click OK, GeoDa will prompt user to specify a file path to store the results. It takes less than one minute to download and save the roads in Chicago area in a shapefile.

octocat

5. Exploratory Travel Heat Map

With a road/network dataset loaded as a working layer, GeoDa can build a graph (or multi-graphs): the vertex and edges of the graph are extracted from the roads, and the distance of each edge is computed between two vertices (in the form of latitude and longitude) using arc distance (in meters) on earth surface. If road type (e.g. highway, motor_way, primary,secondary ..) and speed limitation information are stored in tabular data, GeoDa can further compute a travel time (in seconds) for each edge.

By using the Graph structure and the dijkstra shortest path algorithm, GeoDa can create a “travel heat map” on the fly. To create such a map, the geographical area of current working layer is first divided into small hexagons. One can specify the radius of hexagon in the configuration dialog of "Travel Heat Map". One can also specify other information like specify road type field, and speed limitation if they prefer a travel time instead of travel length.

octocat

GeoDa will compute a travel cost from every hexagon to a “start location” (where the green map marker is). The travel cost could be either the travel length (in meters) or travel time (in seconds). Different color will be assigned to hexagons with different travel cost. There are 10 different colors associated with 10 different categories of travel cost. The smaller travel cost with lighter color; the larger travel cost with darker color.

octocat

Default "start location" is at the center of the map, but one can change the "start location" by right click and choose “Set Start Location” to create different travel heat map on-the-fly. Left click on the map, one can see the shortest path computed by the Dijkstra shortest path algorithm from "start location" to the mouse position.

octocat

This is first implemented to verify the Dijkstra shortest path algorithm used in “contracts” project to compute large distance matrix. It is also a useful tool to get a sense of the road/network and travel cost from different locations in an exploratory approach.

octocat

6 Compute Travel Distance Matrix

Normally, creating a travel distance matrix of a large dataset (e.g. 46k query points, and road/networks of 58,357 roads with 274,242 vertices) is time consuming. For example, on a latest iMacPro with 18 CPU cores, it took about 16 minutes to finish the task and generate a 8+GB binary file.

In GeoDa, by constructing an optimized graph structure, one can compute the travel distance matrix using the same dataset in about 12 minutes just using a regular laptop (MacBook with 4 CPU cores). Another experiment is using a MacPro with 12 CPU cores to complete the same task in about 4 minutes. The performance is about 6x faster.

In GeoDa, it is also easy to compute a travel distance matrix. For example, one can load a road/network dataset (e.g. chicago_road.shp) as a working layer and a query points dataset (e.g. LEHD_blocks.csv) as a second layer. Then use "Table->OpenStreetMap->Compute Travel Distance Matrix" to access this feature. One can configure the "Speed Limit" information and specify field of "highway type", "max speed" and "one-way" in the tabular data of loaded road/network dataset. Click OK, user will be prompted to enter a file path to save the output travel distance matrix.

octocat

There is also a CPU+GPU hybrid solution for computing distance matrix implemented in GeoDa. The CPU part uses a ON(logN) Dijikstra algorithm, and the GPU part uses a O(N^2) algorithm due to a limitation of computational structure on GPU. Since the GPU algorithm is much slower than the CPU algorithm, GeoDa suggest only use GPU+CPU solutions when you have more than 2 GPU cards, and GPU ratio of the task is also need to be tuned manually according to the number of your GPU devices and the performance of your GPU cards.

For example, with 2 GPU cards, the default GPU ratio is set to 0.2, which means the performance will be 20% faster ONLY IF GPUs can complete the 20% tasks as faster as CPU complete the 80% tasks; and you can increase the ratio if you have more GPU cards. This will benefit with a GPU server solution.

octocat

The output of the distance matrix is a simple binary file for testing purpose. HDF-5 will be used to read/write the distance matrix in next release. For reference, there is the structure of the binary file if you want to read it manually:

int (4bytes) number of query points (n_query)

char (1byte) '0' means no query ids, '1' means has query ids // currently only '0' is used, so just skip it

int (4bytes x n_query x n_query) the actual NxN distance matrix