FFIEC ETL System

This is a system for extracting call reports from the Federal Financial Institutions Examination Council's Central Data Repository. The data is loaded into Hbase.

The ETL process is executed in a docker container. A running container can be thought of as a 'job'. Each job is intended to collect the data from a single institution over all collection periods. The idea is that many jobs can be executed simultaneously on a container scheduling/orchestration system. When running a job, specify the institution you'd like data for by passing a valid rssd identifier via the RSSD_TARGET environment variable (or --rssd-target flag).

If you'd just like to update the dataset with a single reporting period (say, in the event of a new call report) then specify the period with PERIOD_TARGET. RSSD_TARGET and PERIOD_TARGET are not mutually exclusive.

If you're feeling patient, you can omit RSSD_TARGET and PERIOD_TARGET.

To run an ETL job:

You'll need an instance of hbase. The program assumes you're running a thrift server. This will get you something to test against, if for some reason you don't have an Hbase cluster laying around.

docker run --name hbase -h hbase -d -v $PWD/hbase:/data -p 2181:2181 -p 8080:8080 -p 8085:8085 -p 9160:9160 -p 16010:16010 -p 9090:9090 dajobe/hbase

From the project root, build the project container.

docker build . --tag=etlffiec:latest

Ideally you'd have something like Kubernetes scheduling these jobs on hardware and managing the FFIEC_TOKEN. For local testing just set some variables in your shell. Here are all the parameters, with their defaults.

FFIEC_USER=
FFIEC_TOKEN=
RSSD_TARGET=None
PERIOD_TARGET=None
THRIFT_GATEWAY=127.0.0.1
THRIFT_PORT=9090
INIT=False
TRUNCATE_TABLES=False
MDRM_PATH=./MDRM.csv
UPDATE_METADATA=False
LOGGING_LEVEL=WARNING
LOGGING_FORMAT=JSON

First you'll want to create the tables and load MDRM metadata into the dictionary table by passing

docker run --name=initffiec --link=hbase etlffiec --init --thrift-gateway=hbase

Here's how to truncate the data, if that's your thing.

docker run --name=truncateffiec --link=hbase etlffiec --truncate-tables --thrift-gateway=hbase

A similar command for refreshing MDRM metadata, perhaps after the data has been updated and you've rebuilt the container (or if you just ran the truncation example above without thinking)

docker run --name=updateffiecmetadata --link=hbase etlffiec --update-metadata --thrift-gateway=hbase

After that, you can loop over an array of RSSD identifiers and execute jobs in parallel.

for ID in 131034 720858 229342 819172 65513 753641; do
    docker run -d --name=${ID} --link=hbase -eFFIEC_USERNAME=${FFIEC_USERNAME} -eFFIEC_TOKEN=${FFIEC_TOKEN} -eTHRIFT_GATEWAY=hbase -eLOGGING_LEVEL=INFO -eRSSD_TARGET=${ID} etlffiec;
    sleep 3
done;

After you've created these containers, it's possible to re-run the operations with docker start

docker start 131034 720858 229342 819172 65513 753641

If you're feeling ambitious.

docker run -d --name=collectffiec --link=hbase -eFFIEC_USERNAME=${FFIEC_USERNAME} -eFFIEC_TOKEN=${FFIEC_TOKEN} -eLOGGING_LEVEL=INFO -eTHRIFT_GATEWAY=hbase etlffiec

To clean up all the above examples

for NAME in 131034 720858 229342 819172 65513 753641 initffiec updateffiecmetadata collectffiec hbase; do
    docker stop ${NAME} ; docker rm ${NAME};
done;

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
bin		bin
ffiec		ffiec
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FFIEC ETL System

To run an ETL job:

About

Releases

Packages

Languages

License

henryroyal/ffiec

Folders and files

Latest commit

History

Repository files navigation

FFIEC ETL System

To run an ETL job:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages