CV, OCR, YOLO, ID
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Implementation of Nougat Neural Optical Understanding for Academic Documents
Turn any computer or edge device into a command center for your computer vision projects.
We write your reusable computer vision tools. 💜
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-e…
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Unofficial implementation of YOLO-World + EfficientSAM for ComfyUI
State-of-the-art 2D and 3D Face Analysis Project
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
Torchreid: Deep learning person re-identification in PyTorch.
Document to Markdown OCR library with Llama 3.2 vision
Tesseract Open Source OCR Engine (main repository)
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL