Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
wenerme committed Dec 8, 2024
1 parent ed3adbf commit 8321b4a
Show file tree
Hide file tree
Showing 42 changed files with 1,081 additions and 228 deletions.
18 changes: 18 additions & 0 deletions notes/ai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,24 @@ title: AI

- 规则系统、专家系统、机器学习

---

- fundamentals
- [机器学习](./ml/README.md)
- Deep Learning
- models
- [LLM](./llm/README.md)
- [GPT](./gpt/README.md)
- [GAN](./gan/README.md)
- Diffusion
- domains
- [OCR](./ocr/README.md)
- [NLP](./nlp/README.md)
- [TTS](./tts/README.md)
- [ASR](./asr/README.md)
- services
- OpenAI

## 解释

- 常见子领域
Expand Down
7 changes: 7 additions & 0 deletions notes/ai/ai-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,10 @@ tags:
- RAQ - retrieval-augmented generation - 检索增强生成
- 参考
- https://research.ibm.com/blog/retrieval-augmented-generation-RAG

## STT vs ASR

- STT: Speech to Text - 语音转文本
- 产品功能描述
- ASR: Automatic Speech Recognition - 自动语音识别
- 技术
2 changes: 2 additions & 0 deletions notes/ai/ai-glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ tags:
| en | cn |
| ---------------- | -------- |
| Stable Diffusion | 稳定扩散 |
| Speech Synthesis | 语音合成 |
| Voice Synthesis | 语音合成 |

## LLM 参数

Expand Down
1 change: 1 addition & 0 deletions notes/ai/asr.md → notes/ai/asr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ tags:

# ASR

- ASR - Automatic Speech Recognition - 自动语音识别
- [FunASR](./funasr.md)
- [Whisper](./whisper.md)
- Kaldi
Expand Down
46 changes: 23 additions & 23 deletions notes/ai/funasr.md → notes/ai/asr/funasr.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,30 +36,30 @@ bash run_server.sh \
## Protocol

```ts
interface OfflineRequestMessage {
mode: 'offline';
wav_name: string;
wav_format: string | 'pcm' | 'mp3' | 'mp4';
is_speaking: boolean; // false -> 断句尾点,例如,vad切割点,或者一条wav结束
audio_fs?: number; // pcm 采样率
hotwords?: Record<string, number>; // 热词
itn?: boolean; // 默认 true
}
interface OfflineRequestMessage {
mode: 'offline';
wav_name: string;
wav_format: string | 'pcm' | 'mp3' | 'mp4';
is_speaking: boolean; // false -> 断句尾点,例如,vad切割点,或者一条wav结束
audio_fs?: number; // pcm 采样率
hotwords?: Record<string, number>; // 热词
itn?: boolean; // 默认 true
}

interface ResponseMessage {
mode: 'offline';
wav_name: string;
text: string;
is_final: boolean;
timestamp?: number[][]; // 时间戳 "[[100,200], [200,500]]"(ms)
stamp_sents?: {
text_seg: string; // 正 是 因 为
punc: string; // ,
start: number;
end: number;
ts_list: number[][]; // [[430,670],[670,810],[810,1030],[1030,1130]]
}[];
}
interface ResponseMessage {
mode: 'offline';
wav_name: string;
text: string;
is_final: boolean;
timestamp?: number[][]; // 时间戳 "[[100,200], [200,500]]"(ms)
stamp_sents?: {
text_seg: string; // 正 是 因 为
punc: string; // ,
start: number;
end: number;
ts_list: number[][]; // [[430,670],[670,810],[810,1030],[1030,1130]]
}[];
}
```

- mode
Expand Down
File renamed without changes.
File renamed without changes.
7 changes: 5 additions & 2 deletions notes/ai/llm/ollama.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,11 @@ title: ollama
```bash
brew install ollama # macOS brew

OLLAMA_FLASH_ATTENTION=1 ollama serve # 启动服务端
ollama run mistral # 运行模型
# 启动服务端
# OLLAMA_KV_CACHE_TYPE 0.5+
OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q4_0 ollama serve

ollama run mistral # 运行模型
ollama list

# https://hub.docker.com/r/ollama/ollama
Expand Down
3 changes: 3 additions & 0 deletions notes/ai/ml/ml-awesome.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,9 @@ tags:
- XCiT
- DINO - Self-Supervised Vision Transformers
- PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
- GOT - Generic Object Tracking
- [GOT-10k](http://got-10k.aitestunion.com/)
- [GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild](https://arxiv.org/abs/1810.11981)
- audio/music/speech/voice/tts
- [microsoft/muzic](https://github.com/microsoft/muzic)
- [yl4579/StyleTTS2](https://github.com/yl4579/StyleTTS2)
Expand Down
2 changes: 2 additions & 0 deletions notes/ai/ml/paddle.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ tags:
- Apache-2.0
- by Baidu
- 飞桨 - PADDLE -> PArallel Distributed Deep LEarning
- 通用框架,但 Paddle 主要中文 OCR, NLP 做得好
- 参考
- https://www.paddlepaddle.org.cn/
- [PaddlePaddle/PaddleHub](https://github.com/PaddlePaddle/PaddleHub)
Expand All @@ -22,6 +23,7 @@ pip install paddlepaddle
pip install paddlepaddle-gpu

# Docker
# 百度镜像 registry.baidubce.com/paddlepaddle/paddle:3.0.0b1
docker run --rm -it -v $PWD:/host --entrypoint /host --name paddle paddlepaddle/paddle /bin/bash
```

56 changes: 0 additions & 56 deletions notes/ai/nlp/ocr/ocr-awesome.md

This file was deleted.

47 changes: 47 additions & 0 deletions notes/ai/ocr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
tags:
- Awesome
---

# OCR

| abbr. | stand for | meaning |
| ----- | -------------------------------------- | ----------------------- |
| OCR | Optical Character Recognition | 光学字符识别 |
| MFD | Mathematical Formula Detection | 数学公式检测 |
| MFR | Mathematical Formula Recognition | 数学公式识别 |
| CRNN | Convolutional Recurrent Neural Network | 卷积循环神经网络 |
| ALPR | Automatic License Plate Recognition | 自动车牌识别 |
| ICR | Intelligent Character Recognition | 智能字符识别 |
| OMR | Optical Mark Recognition | 光学标记识别 |
| MICR | Magnetic Ink Character Recognition | 磁性墨水字符识别 |
| HCR | Handwritten Character Recognition | 手写字符识别 |
| LSTM | Long Short-Term Memory | 长短期记忆网络 |
| CNN | Convolutional Neural Network | 卷积神经网络 |
| TrOCR | Transformer-based OCR | 基于 Transformer 的 OCR |
| TATR | Table Transformer | 表格转换器 |
| DETR | Detection Transformer | 检测转换器 |

- 特性
- Detection
- 文章+Box
- OCR
- 文字识别
- Layout
- 区块、标题
- Reading Order
- 阅读顺序
- Table Recognition
- 表格识别
- 多语言
- 领域
- document OCR
- printed text
- handwriting
- license plates
- image
- photo

---

- [Awesome](./ocr-awesome.md)
21 changes: 21 additions & 0 deletions notes/ai/ocr/doclayout-yolo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
tags:
- YOLO
---

# DocLayout-YOLO

- [opendatalab/DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
- AGPLv3, Python, YOLOv10
- https://huggingface.co/spaces/opendatalab/DocLayout-YOLO
- classes
- title
- plain text
- abandon
- figure
- figure_caption
- table
- table_caption
- table_footnote
- isolate_formula
- formula_caption
81 changes: 81 additions & 0 deletions notes/ai/ocr/ocr-awesome.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
tags:
- Awesome
---

# OCR Awesome

- All in One OCR/OCR Toolkit
- [PaddleOCR](./paddleocr.md)
- Paddle
- by 百度
- https://paddlejs.baidu.com/ocr
- [PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
- [PaddlePaddle/Paddle.js](https://github.com/PaddlePaddle/Paddle.js)
- 很久没维护了
- [hiroi-sora/PaddleOCR-json](https://github.com/hiroi-sora/PaddleOCR-json)
- 离线,Windows,命令行输出 JSON 结果
- [Evezerest/PPOCRLabel](https://github.com/Evezerest/PPOCRLabel)
- 半自动化图形标注工具
- [PP-OCRv4](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/PP-OCRv4_introduction.md)
- [breezedeus/Pix2Text](https://github.com/breezedeus/Pix2Text)
- MIT
- 国内开发者维护
- 简体中文&英文 使用的 CnOCR, 其他使用的 EasyOCR
- p2t 命令行 https://pix2text.readthedocs.io/zh-cn/stable/command/
- macOS 桌面工具 [breezedeus/Pix2Text-Mac](https://github.com/breezedeus/Pix2Text-Mac)
- [RapidAI/RapidOCR](https://github.com/RapidAI/RapidOCR)
- Apache-2.0, Python
- based on ONNXRuntime, OpenVION, PaddlePaddle
- PaddleOCR -> ONNXRuntime
- OCR, Layout, Table, Form, Receipt, Invoice
- [VikParuchuri/surya](https://github.com/VikParuchuri/surya)
- GPLv3, Python
- 支持 Detection, OCR, Layout, Reading Order, Table Recognition
- 基础 OCR/通用 OCR
- EasyOCR
- tesseract
- [naptha/tesseract.js](https://github.com/naptha/tesseract.js)
- Apache-2.0, JS
- [breezedeus/cnocr](https://github.com/breezedeus/cnocr)
- Apache-2.0
- 基于 RapidOCR 集成 PPOCRv4 最新版 OCR 模型
- [jingsongliujing/OnnxOCR](https://github.com/jingsongliujing/OnnxOCR)
- 基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR
- 表格/Table/Layout/文档
- [RapidAI/TableStructureRec](https://github.com/RapidAI/TableStructureRec)
- 表格识别算法的集合库
- wired_table_rec 有线表格识别算法
- lineless_table_rec 无线表格识别算法
- [RapidAI/RapidTable](https://github.com/RapidAI/RapidTable)
- Apache-2.0, Python, ONNX
- 源自 PP-Structure 的表格识别算法,模型转换为ONNX,推理引擎采用ONNXRuntime
- [opendatalab/DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
- AGPLv3, Python, YOLOv10
- https://huggingface.co/spaces/opendatalab/DocLayout-YOLO
- [AlibabaResearch/AdvancedLiterateMachinery](https://github.com/AlibabaResearch/AdvancedLiterateMachinery)
- Apache-2.0, Python, C++
- by 阿里巴巴
- [getomni-ai/zerox](https://github.com/getomni-ai/zerox)
- MIT
- PDF to Markdown
- 使用 OpenAI, Anthropic, AWS Bedrock
- [katanaml/sparrow](https://github.com/katanaml/sparrow)
- GPLv3, Python
- Data processing with ML, LLM and Vision LLM
- [mindee/doctr](https://github.com/mindee/doctr)
- Apache-2.0, Python, TensorFlow 2, PyTorch
- [Walleclipse/ChineseAddress_OCR](https://github.com/Walleclipse/ChineseAddress_OCR)
- [ooooverflow/chinese-ocr](https://github.com/ooooverflow/chinese-ocr)
- CRNN
- macOS OCR Live Text
- 直接 Preview 在图片上识别文字
- [dynobo/normcap](https://github.com/dynobo/normcap)
- OCR powered screen-capture tool
- [faustomorales/keras-ocr](https://github.com/faustomorales/keras-ocr)
- [TDiblik/main-gate-alpr](https://github.com/TDiblik/main-gate-alpr)
- license plates
- https://news.ycombinator.com/item?id=37384327
- https://github.com/kba/awesome-ocr
- 商业
- https://doc2x.noedgeai.com/
14 changes: 14 additions & 0 deletions notes/ai/nlp/ocr/paddleocr.md → notes/ai/ocr/paddleocr.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@ title: PaddleOCR
- https://gitee.com/duolabmeng666/paddlehub_ppocr/blob/master/Dockerfile
- https://gitee.com/paddlepaddle/PaddleOCR/blob/release/2.6/deploy/docker/hubserving/cpu/Dockerfile

```bash
# registry.baidubce.com/paddlepaddle/paddle:3.0.0b1-jupyter
# registry.baidubce.com/paddlepaddle/paddle:3.0.0b1
docker run --rm -it \
-v $PWD:/paddle \
--name paddle registry.baidubce.com/paddlepaddle/paddle:3.0.0b1 /bin/bash
```

```py
from paddleocr import PaddleOCR, draw_ocr

Expand All @@ -37,3 +45,9 @@ im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

## PP-Structure

- PP-Structure 文档分析
- https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppstructure
- https://paddlepaddle.github.io/PaddleOCR/latest/ppstructure/overview.html
Loading

0 comments on commit 8321b4a

Please sign in to comment.