Setup

If you are using MacOS, then run brew install poppler tesseract first.

Create and activate python virtual environment.
Run pip install -r requirements.txt to install dependencies.
Run download.sh to get the minimal set of files required to run inference.

Docker setup

Run docker build . -t badgerdoc-tb-extr
Use built image for example like this docker run -it badgerdoc-tb-extr bash

Or any other way

Use models from AWS S3

If you would like to download models from S3, Before running setup following environment variables for example with docker env file AWS_S3_ENDPOINT # S3 endpoint url AWS_ACCESS_KEY_ID # aws access key id AWS_SECRET_ACCESS_KEY # aws secret access key AWS_REGION # AWS region name AWS_S3_SSE_TYPE # AWS SSE type (optional)

Run excel or pdf pipeline

python -m table_extractor.run run <path-to-pdf-or-excel> <results-output-dir> --model_path <model-file-path> --verbose <true/false> --paddle_on <true/false> Pipeline will automatically decide how parse your document based on file extension. Supported file formats: *.pdf, *.xlsx, *.xlsm, *.xltx, *.xltm

Model on S3 could be used if <model-file-path> provided in format s3://<bucket>/<models_path>

Run pdf pipeline

Run pipeline on single pdf document

python -m table_extractor.run run-sequentially <path-to-pdf> <results-output-dir> --model_path <model-file-path> --verbose <true/false> --paddle_on <true/false>

Model on S3 could be used if <model-file-path> provided in format s3://<bucket>/<models_path>

Run excel extractor

python -m excel_extractor.excel_run <path-to-excel> <output-path>

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
configs		configs
excel_extractor		excel_extractor
excel_header_training		excel_header_training
language		language
table_extractor		table_extractor
test_resources		test_resources
training		training
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.requirements.txt		.requirements.txt
.travis.yaml		.travis.yaml
.travis.yml		.travis.yml
Dockerfile		Dockerfile
Domino.dockerfile		Domino.dockerfile
Makefile		Makefile
README.md		README.md
dev-requirements.in		dev-requirements.in
dev-requirements.txt		dev-requirements.txt
download.sh		download.sh
header_async.py		header_async.py
helpers.py		helpers.py
poetry.lock		poetry.lock
prepare_dataset.py		prepare_dataset.py
pyproject.toml		pyproject.toml
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup

Docker setup

Use models from AWS S3

Run excel or pdf pipeline

Run pdf pipeline

Run excel extractor

About

Releases

Packages

Contributors 4

Languages

badgerdoc/badgerdoc

Folders and files

Latest commit

History

Repository files navigation

Setup

Docker setup

Use models from AWS S3

Run excel or pdf pipeline

Run pdf pipeline

Run excel extractor

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages