Safely send PDF documents to LLM

This tool uses in-browser Tesseract OCR to extract text from PDF files and images.

Then, it anonymizes it by removing or PII (Personally Identitable Information) so you can safely send it to ChatGPT. What is cool you might use it for example to scan PDF documents before using them with non-multimodal LLMS (Ollama ...).

In this example we do use ChatGPT to enhance and fix Tesseract issues as well. This is a PoC project intended to be used for privacy-critical LLM cases, like health data etc.

Getting Started

First, run the development server:

npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun dev

Open http://localhost:3000 with your browser to see the result.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
public		public
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
components.json		components.json
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Safely send PDF documents to LLM

Getting Started

About

Releases

Packages

Languages

License

CatchTheTornado/llm-pdf-ocr-anonimizer

Folders and files

Latest commit

History

Repository files navigation

Safely send PDF documents to LLM

Getting Started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages