Semi-automatic data anonymization for German documents.
This repository is the home to the OpenRedact app, a webapp for semi-automatic anonymization of German language documents. OpenRedact is a Prototype Fund project, funded by the Federal Ministry of Education and Research. A detailed description of the project and prototype can be seen here.
You can use the CLI script backend/cli/redact.py
to anonymize a directory of documents in an unsupervised manner.
./redact.py --input_dir "path/to/documents/" --output_dir "out/directory/"
Call ./redact.py --help
for usage instructions and important notes.
This screencast walks you through the anonymization of a document, from upload to download of the anonymized file.
This screencast demonstrates the different anonymization methods that OpenRedact supports. The modifications on the left are immediately previewed on the right.
The automatically detected and proposed personal data can be corrected and extended by the user using our annotation tool.
Based on the manual corrections and extensions, we can assess the mechanism for automatic detection of personal data.
The app is best deployed using Docker.
We have pre-built Docker images available at https://hub.docker.com/u/openredact.
Pull and start the containers by running:
# Clone the repo
git clone https://github.com/openredact/openredact-app.git
cd openredact-app
# Pull images & start containers
docker-compose pull
docker-compose up
This will host the backend at port 8000 (and http://localhost/api) and the frontend at port 80. Once started, you can access the webapp at http://localhost/.
cd frontend
docker build -t openredact/frontend .
docker run -p 80:80 openredact/frontend
This will build the frontend inside a node Docker container and deploy the result in an nginx container. For more details about this procedure see React in Docker with Nginx, built with multi-stage Docker builds, including testing.
cd backend
docker build -t openredact/backend .
docker run -p 8000:8000 openredact/backend
Documentation of the API is available at the endpoints /docs
(Swagger UI)
and /redocs
(ReDoc), e.g. http://127.0.0.1:8000/redoc.
The OpenAPI specification can be found here.
First, follow the instructions in the backend or frontend readme. Then, continue with the instructions below.
If you want to use our Docker setup for development, run:
docker-compose -f docker-compose.dev.yml up
Don't forget to add the project's directory to the list of allowed file sharing resources in the Docker Desktop preferences.
pre-commit
is a Python tool to manage git pre-commit hooks.
Running the following code requires the backend dev requirements to be set up as explained here.
We have pre-commit hooks for formatting and linting Python and JavaScript code (black, flake8, prettier and eslint).
Note that the tests, being slower than formatters and linters, are run by CI.
So don't forget to run them manually before committing.
pre-commit install
git config --bool flake8.strict true # Makes the commit fail if flake8 reports an error
To run the hooks:
pre-commit run --all-files
For usage questions, bugs, or suggestions please file a Github issue. If you would like to contribute or have other questions please email [email protected].