EZ_Txt

A simple web application to extract text from various document formats using MarkItDown.

Features

Extract text from multiple document formats including PDF, Office documents, images, and more
Simple web interface using Gradio
Optional authentication
Optional API visibility

Installation

Clone this repository
Install dependencies:

pip install -r requirements.txt

Docker

You can run EZ_Txt using the pre-built Docker container:

docker pull ghcr.io/usnavy13/ez-txt:latest
docker run -p 7860:7860 ghcr.io/usnavy13/ez-txt:latest

Then access the application at http://localhost:7860

Configuration

Create a .env file in the root directory with the following optional settings:

# Optional authentication (remove or leave empty to disable)
user=your_username
password=your_password

# Optional API visibility (default: false)
show_api=false

Azure Document Intelligence Integration

To use Azure Document Intelligence for text extraction, ensure you have an active Azure subscription and a Document Intelligence resource. Then, add the following entries to your .env file:

AZURE_ENDPOINT=https://<your-azure-endpoint>
AZURE_API_KEY=<your-azure-api-key>

The application will automatically enable the "Azure Document Intelligence" extraction method when both variables are present. Make sure the environment variable names match those expected by the code.

Usage

Run the application:

python main.py

Open your browser and navigate to http://localhost:7860
Upload a document and click "Extract text" to get the text content

Supported File Types

Documents: PDF, PPTX, DOCX, XLSX
Images: PNG, JPG, JPEG
Text: TXT, CSV, JSON, XML, HTML
Archives: ZIP (will process contained files)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
azure_document_intelligence.py		azure_document_intelligence.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EZ_Txt

Features

Installation

Docker

Configuration

Azure Document Intelligence Integration

Usage

Supported File Types

About

Releases

Packages

Languages

License

usnavy13/EZ_Txt

Folders and files

Latest commit

History

Repository files navigation

EZ_Txt

Features

Installation

Docker

Configuration

Azure Document Intelligence Integration

Usage

Supported File Types

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages