EALGIS - Generic CSV Loader

Got some data in CSV files you want to load into your EALGIS install? Well you've come to the right place!

To follow this guide you'll need two things:

Some spatial data (Covered under Load the relevant geographic areas below.)
Some CSV formatted data (Covered under Creating the config file for your data below.)

Tip: We're going to talk a lot about "schemas" in this guide. In a technical sense, these are literally PostgreSQL database schemas. In a practical sense, they're containers for grouping together one or more datasets. e.g. The ABS Census 2016 - General Community Profile data probably lives in a schema called aus_census_2016_gcp.

At the end of this process you'll have a PostgreSQL .dump file containing a schema with your dataset(s) that you can load into your EALGIS install. This file is intended to completely replace the existing schema.

Since you'll likely have multiple datasets in one schema we recommend keeping the CSV and JSON files for each schema in a structure like so:

config/
    schema_name/
        dataset_a/
            dataset_a.csv
            dataset_a.json
        dataset_b_/
            dataset_b_.csv
            dataset_b_.json
        ...

Should you then ever need to create the schema from scratch, it's then just a matter of running load.sh against each of the JSON files in turn.

0. Prerequisites

1. Initialise the database

But before you start any of that you need to initialise the standalone database that this CSV loader will use to temporarily store your data. Simply run:

docker-compose up datastore

And if all goes well will the log will end with:

LOG:  database system is ready to accept connections

You can verify this by connecting to the database on localhost (username/password: postgres/postgres) and looking at the database called ealgis. At this stage it should be populated with the default public schema, and the three special schemas that come with PostGIS by default:

    List of schemas
    Name    |  Owner
------------+----------
 public     | postgres
 tiger      | postgres
 tiger_data | postgres
 topology   | postgres
(4 rows)

You can now exit and shutdown the datastore Docker container (docker-compose stop datastore).

2. Load the relevant geographic areas

Next up you'll need to load in the spatial data that will provide the link between your CSV and seeing your data on a map.

This will be the same data that you've already got loaded in your EALGIS install, but for the purposes of this loader we also need a copy in the loader's database. (For Census data, it's found in schemas like aus_census_2016_shapes.)

Getting spatial data

If you're working with data using ABS geographic areas or electoral boundaries, then you can simply download the relevant PostgreSQL dump files from our postgresql_dumps repo.

Tip: If you'd like to run the data loaders yourself the postgresql_dumps repo has links to our collection of data loaders.

Your own custom data

If you're working with your own custom data, then you'll need to use pg_dump to dump out your spatial data schema.

Loading spatial data

Wherever you got your data from, drop the .dump somewhere in this repo and run:

docker-compose run dataloader /bin/bash

Once inside the dataloader container you can pg_restore your dump file.

Restoring the whole schema

It's usually best to restore the whole schema (even if that can take quite a while). For example:

pg_restore --host=datastore --username=postgres --dbname=ealgis /app/aus_census_2016_shapes.dump

Selectively restoring the schema

Caveat emptor: This process may change with future versions of EALGIS and may break in fun and interesting ways.

But if you really want to, you can restore only the geometry tables that are relevant to the CSV files that will be in the schema created by this loader.

In this case, let's load data from the sa2 table:

pg_restore --host=datastore --username=postgres --dbname=ealgis --schema-only /app/aus_census_2016_shapes.dump
pg_restore --host=datastore --username=postgres --dbname=ealgis --data-only --table=table_info /app/aus_census_2016_shapes.dump
pg_restore --host=datastore --username=postgres --dbname=ealgis --data-only --table=ealgis_metadata /app/aus_census_2016_shapes.dump
pg_restore --host=datastore --username=postgres --dbname=ealgis --data-only --table=column_info /app/aus_census_2016_shapes.dump
pg_restore --host=datastore --username=postgres --dbname=ealgis --data-only --table=geometry_source /app/aus_census_2016_shapes.dump
pg_restore --host=datastore --username=postgres --dbname=ealgis --data-only --table=geometry_source_projection /app/aus_census_2016_shapes.dump
pg_restore --host=datastore --username=postgres --dbname=ealgis --data-only --table=sa2 /app/aus_census_2016_shapes.dump

3. Creating the config file for your data

Now we can move on to the main event: Writing the JSON file that describes our data!

Check out the config/sample/ directory in this repo for an example CSV and JSON file based on the 2017-18 Regional Population Growth data for NSW from the ABS. All of the documentation and examples from here on are based on this dataset and should be read alongside the sample files provided.

Each CSV dataset that you want to load into a schema will needs its own JSON file.

Data

Attribute	Description
type	Possible values: `csv`	Required
file	The name of the file containing the data. Relative to the location of your JSON config file.	Required
db_table_name	The name of the database table that will be created for this dataset. May not contain invalid characters like `.` and `-`.	Required
csv	A JSON object with directives that control how the CSV file will be processed. See `CSV Options` below.	Required

CSV Options

Important: CSV column names must be unique within each schema. i.e. Two datasets in the same schema can't use the same column name.

Note: All CSV column names will be converted to lower case during the ingest process.

Attribute	Description
dialect	From csv.reader	Optional. Defaults to `excel`.
encoding	The character encoding of the CSV file.	Optional. Defaults to `utf-8-sig`.
skip	How many lines to skip.	Required. Set to 0 if not applicable. (This makes no sense.)

Schema

Note: If you've already run this CSV loader and created a schema then you only need to provide the name attribute.

Attribute	Description
name	The name of the schema that the dataset will be loaded into. e.g. `sample`	Required. Must be a valid PostgreSQL schema name.
title	The title of the schema has users will see it. e.g. `2019 Project Data`	Optional - only needed if the schema doesn't exist.
description	A description for the schema.	Optional - only needed if the schema doesn't exist.
date_published	When the schema was published. e.g. `2019-03-27`	Optional - only needed if the schema doesn't exist. Must be formatted as `YYYY-MM-DD`.

Geometry Linkage

Attribute	Description
shape_schema	The name of the database schema that contains the geometry for this dataset. e.g. `aus_census_2016_shape`	Required
shape_table	The name of the database table that contains the geometry for this dataset. e.g. `sa2`	Required
shape_column	The name of the column in `shape_table` that contains the ids for the geometry in this dataset. e.g. `sa2_main`	Required
csv_column	The name of the column in the CSV file that contains the ids for the geometry. e.g. `sa2_main`	Required
match	The method used for comparing `csv_column` and `shape_column`.	Required. Possible values: `str`

Metadata

Attribute	Description
collection_name	The name of the collection of datasets this data will be grouped with. e.g. `New South Wales`	Required
title	A name of the dataset that users will see in the data browser. e.g. `Regional Population Growth, NSW, 2017-18`	Required
description	A description for this dataset that users will see in the data browser.	Required
kind	The sub-title for this dataset. Usually, the population/thing the data is describing. e.g. `Persons`	Required
family	A short identifier/acronym/product code for this dataset. e.g. `3218.0`	Required

Column Metadata

An optional JSON object that provides human-readable titles for the column names in the CSV.

For example:

"column_metadata": {
    "erp_30_june_2017": "ERP at 30 June 2017",
    "erp_30_june_2018": "ERP at 30 June 2018"
}

4. Run the data loader

Now that you've described how your data should be processed we can run the loader!

docker-compose run dataloader /bin/bash
./load.sh /app/config/sample/config.json

The output should look something like this:

2019-08-18 10:07:36,403 [INFO   ] [MainThread]  create schema: sample
2019-08-18 10:07:38,443 [INFO   ] [MainThread]  dumping database: /app/tmp/sample.dump
2019-08-18 10:07:38,757 [INFO   ] [MainThread]  successfully dumped database to /app/tmp/sample.dump
2019-08-18 10:07:38,760 [INFO   ] [MainThread]  load with: pg_restore --username=user --dbname=db /path/to/sample
2019-08-18 10:07:38,763 [INFO   ] [MainThread]  then run VACUUM ANALYZE;

5. Loading the data into EALGIS

Drop your .dump file in the loaders directory of your EALGIS install. If necessary, add the loaders directory to the datastore container in your docker-compose.yml file:

volumes:
    - ./loaders:/app

Note: The .dump file created by this loader is intended to completely replace the existing schema in your EALGIS install. So remember to DROP SCHEMA schema_name CASCADE; before trying to pg_restore this schema.

docker exec -it ealgis_datastore_1 sh
pg_restore --host=localhost --username=postgres --dbname=datastore /app/sample.dump

As your last steps:

Run VACUUM ANALYZE;
Restart your EALGIS web container so it picks up the newly refreshed schema

And you're done!

Maintenance

Updating docker-ingest-base

If we release a new version of the ealgis/ingest-base Docker image used for dataloader, execute these commands to pull in a new version.

docker-compose rm dataloader
docker image rm ealgis/ingest-base:latest
docker-compose pull dataloader

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config/sample		config/sample
static		static
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
load.sh		load.sh
recipe.py		recipe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EALGIS - Generic CSV Loader

0. Prerequisites

1. Initialise the database

2. Load the relevant geographic areas

Getting spatial data

Your own custom data

Loading spatial data

Restoring the whole schema

Selectively restoring the schema

3. Creating the config file for your data

Data

CSV Options

Schema

Geometry Linkage

Metadata

Column Metadata

4. Run the data loader

5. Loading the data into EALGIS

Maintenance

Updating docker-ingest-base

About

Releases

Packages

Contributors 2

Languages

ealgis/obsolete-generic-csv-loader

Folders and files

Latest commit

History

Repository files navigation

EALGIS - Generic CSV Loader

0. Prerequisites

1. Initialise the database

2. Load the relevant geographic areas

Getting spatial data

Your own custom data

Loading spatial data

Restoring the whole schema

Selectively restoring the schema

3. Creating the config file for your data

Data

CSV Options

Schema

Geometry Linkage

Metadata

Column Metadata

4. Run the data loader

5. Loading the data into EALGIS

Maintenance

Updating docker-ingest-base

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages