Skip to content
View myhloli's full-sized avatar

Block or report myhloli

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

All-in-One Development Tool based on PaddlePaddle(飞桨低代码开发工具)

Python 5,152 989 Updated Mar 3, 2025

Toolkit for linearizing PDFs for LLM datasets/training

Python 6,136 384 Updated Feb 28, 2025

Official implementation of Character Region Awareness for Text Detection (CRAFT)

Python 3,197 912 Updated Jul 16, 2024

A new markup-based typesetting system that is powerful and easy to learn.

Rust 38,051 1,040 Updated Feb 27, 2025

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Python 2,229 167 Updated Feb 20, 2025

A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.

Python 3,959 303 Updated Mar 1, 2025

PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)

C++ 81 158 Updated Feb 27, 2025

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 16,509 1,077 Updated Feb 28, 2025

Reverse Engineering: Decompiling Binary Code with Large Language Models

Python 5,179 349 Updated Oct 28, 2024

A conda-forge distribution.

Shell 7,155 371 Updated Mar 1, 2025

Generate a comprehensive review from an arXiv paper, then turn it into a blog post. This project powers the website below for the HuggingFace's Daily Papers (https://huggingface.co/papers).

Python 737 82 Updated Feb 20, 2025

Task-Aware Agent-driven Prompt Optimization Framework

Python 2,904 240 Updated Jan 10, 2025
Python 5 Updated Dec 23, 2024

A machine learning software for extracting information from scholarly documents

Java 3,837 472 Updated Feb 28, 2025

Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured …

Python 2,432 195 Updated Feb 4, 2025

Document Rectification and Illumination Correction using a Patch-based CNN

Python 358 86 Updated Sep 28, 2022

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 27,044 2,081 Updated Feb 27, 2025

⚡ TabPFN: Foundation Model for Tabular Data ⚡

Python 2,798 230 Updated Mar 2, 2025

UniTable: Towards a Unified Table Foundation Model

Jupyter Notebook 438 34 Updated Jun 4, 2024

A model(ing framework) for sample efficient OCR

Python 56 5 Updated Apr 7, 2023

Get your documents ready for gen AI

Python 23,151 1,336 Updated Mar 3, 2025

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.

Python 2,518 173 Updated Feb 27, 2025

OCR & Document Extraction using vision models

TypeScript 10,106 661 Updated Feb 28, 2025

A Comprehensive Benchmark for Document Parsing and Evaluation

Python 263 22 Updated Feb 25, 2025

基于MinerU的桌面应用程序,MinerU是一款开源的高质量PDF解析工具,基于深度学习技术,可自动提取PDF文档中的文字、表格、图片、公式等内容,并提供丰富的分析、统计、搜索等功能。 本项目为其提供一个简化版本的WebUI,方便用户上传PDF文件,并实时展示提取结果。

JavaScript 66 5 Updated Oct 17, 2024

Efficient vision foundation models for high-resolution generation and perception.

Python 2,677 214 Updated Jan 24, 2025

Tesseract Open Source OCR Engine (main repository)

C++ 64,986 9,709 Updated Feb 12, 2025

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

Python 30,061 3,011 Updated Feb 9, 2025

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 25,752 3,257 Updated Sep 24, 2024
Next
Showing results