Papersmith

An AI-powered PDF renamer that uses OpenAI's gpt-4o-mini vision model (and others) to intelligently rename PDF documents based on their content. Papersmith analyzes your PDFs and generates descriptive filenames that include the document date, type, and title.

How It Works

Papersmith converts the first few pages of each PDF to images (using pdf2image - requries Poppler)
These images are sent to OpenAI's vision model for analysis
The AI extracts key information like dates and document types
A standardized filename is generated in the format: YYYYMMDD-title-category.pdf
The PDF is renamed according to this format
This process is idempotent, as Papersmith will not rename files that already match the expected format

Installation

System Dependencies

Papersmith uses pdf2image which requires Poppler to be installed:

macOS

brew install poppler

Ubuntu/Debian

sudo apt-get update
sudo apt-get install poppler-utils

Windows

Download the latest Poppler release from poppler-windows
Extract it to a location on your system (e.g., C:\Program Files\poppler)
Add the bin directory to your system's PATH environment variable. You may need to restart your computer after this step.

Application Setup

Ensure you have Rust installed (rustup.rs)
Clone this repository
Create a .env file with your OpenAI API key:

OPENAI_API_KEY=your_api_key_here

Usage

# Basic usage (processes all PDFs in test-data directory)
papersmith

# Process PDFs in a specific directory
papersmith --glob-pattern "./invoices/*.pdf"

# Preview changes without renaming files
papersmith --dry-run

# Specify GPT model and number of pages to analyze
papersmith --model gpt-4o-mini --n-pages 2

Command Line Options

-g, --glob-pattern <PATTERN>: Specify which PDFs to process (default: "./test-data/*.pdf")
-m, --model <MODEL>: Choose the GPT model to use (default: "gpt-4o-mini")
-n, --n-pages <NUMBER>: Number of pages to analyze per document (default: 3)
-d, --dry-run: Preview changes without renaming files
-h, --help: Display help information
-V, --version: Display version information

For example:

Scanned Document 1.pdf → 20240916-bunnings-invoice.pdf
Scanned Document 2.pdf → 20241016-wagga-wagga-airport-invoice.pdf
Document.pdf → 20231225-unknown-document.pdf

Building

Run these commands in the project root directory:

# Debug build
cargo build

# Release build
cargo build --release

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Papersmith

How It Works

Installation

System Dependencies

macOS

Ubuntu/Debian

Windows

Application Setup

Usage

Command Line Options

Building

License

About

Releases

Packages

Languages

License

benletchford/papersmith

Folders and files

Latest commit

History

Repository files navigation

Papersmith

How It Works

Installation

System Dependencies

macOS

Ubuntu/Debian

Windows

Application Setup

Usage

Command Line Options

Building

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages