pyBNA includes a module with methods for importing data from publicly-available sources. These sources are:
Source | Usage |
---|---|
US Census | Population; Employment |
OpenStreetMap | Streets and trails; Destinations |
Data imports are done using the Importer class. This can be imported with
from pybna import Importer
From there, an Importer object must be instantiated. There are two ways to tell Importer what it needs to connect to your database. The first way is to provide a configuration file (see config instructions here). The second way is to provide connection details explicitly.
Config option:
i = Importer(config="/path/to/config/file")
Explicit details option:
i = Importer(host="myhost" db_name="mydb" user="myuser" password="mypassword")
Any of the connection parameters in the configuration file can be overwritten with explicit arguments. E.g.
i = Importer(config="/path/to/config/file",user="myuser")
The BNA uses a study area boundary to limit the geography under consideration. If you provided a config file at startup, this boundary should have been defined in the config file. You can import the boundary into your database with:
i.import_boundary(fpath="/path/to/my/boundary/file")
If you want to override the config or if you started Importer without a config
file, you can specify a table name with the table
option.
The boundary file is used in all other import methods for filtering data before
it is uploaded to your database. If you started Importer without a config file,
you will need to provide a boundary with the boundary_file
option for each of
the other import methods.
Census blocks are the standard unit of analysis for the BNA. By default, the BNA uses Census Blocks as defined in the 2010 US Census. When blocks are imported they are automatically clipped to the study area. Blocks can be obtained in one of three ways: path to a file, URL, or US state FIPS code.
It is entirely possible to use blocks that are totally unrelated to the US Census. This is necessary for e.g. running BNA outside of the United States. The only items necessary to the BNA are:
- Geographic boundaries
- A unique identifier
- Population data
Blocks will be uploaded to your database at the location specified in your
config file, unless you provide an alternate table name. If you started the
Importer without a config file and don't explicitly specify a table, they will
be uploaded to the default location (generated.neighborhood_census_blocks
).
The import also requires a study area boundary. If this is provided in your
config file it will be used. If not, you'll need to specify one with the
boundary_file
option.
File path option:
If you have already downloaded a file to use for blocks, you can point to this with:
i.import_census_blocks(fpath="/path/to/my/blocks/file")
We have found that the default import from the Census can take some time as it
is necessary to download the entire state's dataset, load it in memory, and then
filter to a smaller area. If you prefer not to wait, it is usually faster to
download the state dataset yourself, delete census blocks outside of the
vicinity of your study area, and then point the Importer to the saved shapefile
with this fpath
option.
URL option:
You can point the BNA to a URL to download a file with:
i.import_census_blocks(url="https://valid/url/to/blocks/file")
FIPS option:
If you supply the FIPS code for the state you're working in pyBNA will download the blocks automatically for you:
i.import_census_blocks(fips=16) # e.g. for working in Idaho (FIPS 16)
By default, jobs data come from the Longitudinal Employer-Household Dynamics (LEHD) dataset produced by the US Census. As with Census Blocks, the jobs dataset does not have to be related to US Census data. Supplying your own jobs data is easy. pyBNA has the following requirements for user-supplied jobs data:
- A unique identifier that corresponds to the ID associated with a block
- A number of jobs specific to that block
Jobs importing is easily done by specifying the table to create and the US state to pull jobs data for:
i.import_census_jobs("myschema.myjobstable",state="ID")
The state argument should be a two-letter abbreviation of the state as used by
the US Census. If you have already downloaded the LEHD datasets you can point to
the "main" and "aux" files with the fpath_main
and fpath_aux
arguments.
The table created by this statement should correspond to the jobs table
identified in your config file as a destination type. By default, this is the
"employment" category in the config file and the table is named
neighborhood_census_block_jobs
.
If you're supplying your own file for jobs data, you should load it into the database manually (ensuring to update your config file accordingly).
OpenStreetMap is the default source for road network and off-street trail data for the BNA. The OSM network is already topologically correct and ready for computer routing, including for off-street trails. This is a huge advantage over many municipal road network datasets that often don't include trails and sometimes don't enforce basic network topology.
The import is as easy as:
i.import_osm_network()
As with other import methods, the import will use your config file (if you provided one) or BNA defaults (if you didn't), unless you explicitly provide a table name. A boundary file is also required for this import unless one is given in your config file.
If you already downloaded an OSM extract locally you can refer to it instead of
pulling data over the network with the osm_file
option.
Note that very large OSM files can result in memory errors (computer running
out of memory). One strategy to avoid this would be to isolate only features
with the tag highway=*
. There are several options for filtering raw OSM data,
such as Osmium:
osmium tags-filter -o ~/path/to/output/file ~/path/to/input/file nw/highway
Destination data also comes from OpenStreetMap by default. The Importer uses the
destination tags defined in the configuration file,
but you can provide separate instructions for extracting OSM destinations using
the destination_tags
option. To do this, you'll need to create a dictionary of
table names and OSM tags that mimics the format encoded in the configuration
file.
Importing destinations can be done with:
i.import_osm_destinations()
If you downloaded an OSM extract you can point to this with the osm_file
option. Please note pyBNA will look for destination matches on the entire OSM
file without filtering to your study area boundary. This won't affect your
analysis, but it could take an extremely long time if you downloaded an OSM
extract that is significantly larger than your study area. We suggest clipping
your extract before using it. There are many excellent tools available to do
this.