update

wenerme · Dec 8, 2024 · 8321b4a · 8321b4a
1 parent ed3adbf
commit 8321b4a
Show file tree

Hide file tree

Showing 42 changed files with 1,081 additions and 228 deletions.
diff --git a/notes/ai/README.md b/notes/ai/README.md
@@ -10,6 +10,24 @@ title: AI
 
 - 规则系统、专家系统、机器学习
 
+---
+
+- fundamentals
+  - [机器学习](./ml/README.md)
+  - Deep Learning
+- models
+  - [LLM](./llm/README.md)
+    - [GPT](./gpt/README.md)
+  - [GAN](./gan/README.md)
+  - Diffusion
+- domains
+  - [OCR](./ocr/README.md)
+  - [NLP](./nlp/README.md)
+  - [TTS](./tts/README.md)
+  - [ASR](./asr/README.md)
+- services
+  - OpenAI
+
 ## 解释
 
 - 常见子领域

diff --git a/notes/ai/ai-faq.md b/notes/ai/ai-faq.md
@@ -29,3 +29,10 @@ tags:
 - RAQ - retrieval-augmented generation - 检索增强生成
 - 参考
   - https://research.ibm.com/blog/retrieval-augmented-generation-RAG
+
+## STT vs ASR
+
+- STT: Speech to Text - 语音转文本
+  - 产品功能描述
+- ASR: Automatic Speech Recognition - 自动语音识别
+  - 技术
diff --git a/notes/ai/ai-glossary.md b/notes/ai/ai-glossary.md
@@ -40,6 +40,8 @@ tags:
 | en               | cn       |
 | ---------------- | -------- |
 | Stable Diffusion | 稳定扩散 |
+| Speech Synthesis | 语音合成 |
+| Voice Synthesis  | 语音合成 |
 
 ## LLM 参数
 

diff --git a/notes/ai/asr.md → notes/ai/asr/README.md b/notes/ai/asr.md → notes/ai/asr/README.md
@@ -5,6 +5,7 @@ tags:
 
 # ASR
 
+- ASR - Automatic Speech Recognition - 自动语音识别
 - [FunASR](./funasr.md)
 - [Whisper](./whisper.md)
 - Kaldi

diff --git a/notes/ai/funasr.md → notes/ai/asr/funasr.md b/notes/ai/funasr.md → notes/ai/asr/funasr.md
@@ -36,30 +36,30 @@ bash run_server.sh \
 ## Protocol
 
 ```ts
-  interface OfflineRequestMessage {
-    mode: 'offline';
-    wav_name: string;
-    wav_format: string | 'pcm' | 'mp3' | 'mp4';
-    is_speaking: boolean; // false -> 断句尾点，例如，vad切割点，或者一条wav结束
-    audio_fs?: number; // pcm 采样率
-    hotwords?: Record<string, number>; // 热词
-    itn?: boolean; // 默认 true
-  }
+interface OfflineRequestMessage {
+  mode: 'offline';
+  wav_name: string;
+  wav_format: string | 'pcm' | 'mp3' | 'mp4';
+  is_speaking: boolean; // false -> 断句尾点，例如，vad切割点，或者一条wav结束
+  audio_fs?: number; // pcm 采样率
+  hotwords?: Record<string, number>; // 热词
+  itn?: boolean; // 默认 true
+}
 
-  interface ResponseMessage {
-    mode: 'offline';
-    wav_name: string;
-    text: string;
-    is_final: boolean;
-    timestamp?: number[][]; // 时间戳 "[[100,200], [200,500]]"(ms)
-    stamp_sents?: {
-      text_seg: string; // 正 是 因 为
-      punc: string; // ,
-      start: number;
-      end: number;
-      ts_list: number[][]; // [[430,670],[670,810],[810,1030],[1030,1130]]
-    }[];
-  }
+interface ResponseMessage {
+  mode: 'offline';
+  wav_name: string;
+  text: string;
+  is_final: boolean;
+  timestamp?: number[][]; // 时间戳 "[[100,200], [200,500]]"(ms)
+  stamp_sents?: {
+    text_seg: string; // 正 是 因 为
+    punc: string; // ,
+    start: number;
+    end: number;
+    ts_list: number[][]; // [[430,670],[670,810],[810,1030],[1030,1130]]
+  }[];
+}
 ```
 
 - mode

diff --git a/notes/ai/whisper.md → notes/ai/asr/whisper.md b/notes/ai/whisper.md → notes/ai/asr/whisper.md
diff --git a/notes/ai/dalle.md → notes/ai/gan/dalle.md b/notes/ai/dalle.md → notes/ai/gan/dalle.md
diff --git a/notes/ai/llm/ollama.md b/notes/ai/llm/ollama.md
@@ -29,8 +29,11 @@ title: ollama
 ```bash
 brew install ollama # macOS brew
 
-OLLAMA_FLASH_ATTENTION=1 ollama serve # 启动服务端
-ollama run mistral                    # 运行模型
+# 启动服务端
+# OLLAMA_KV_CACHE_TYPE 0.5+
+OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q4_0 ollama serve
+
+ollama run mistral # 运行模型
 ollama list
 
 # https://hub.docker.com/r/ollama/ollama

diff --git a/notes/ai/ml/ml-awesome.md b/notes/ai/ml/ml-awesome.md
@@ -138,6 +138,9 @@ tags:
     - XCiT
     - DINO - Self-Supervised Vision Transformers
     - PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
+  - GOT - Generic Object Tracking
+    - [GOT-10k](http://got-10k.aitestunion.com/)
+    - [GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild](https://arxiv.org/abs/1810.11981)
 - audio/music/speech/voice/tts
   - [microsoft/muzic](https://github.com/microsoft/muzic)
   - [yl4579/StyleTTS2](https://github.com/yl4579/StyleTTS2)

diff --git a/notes/ai/ml/paddle.md b/notes/ai/ml/paddle.md
@@ -9,6 +9,7 @@ tags:
   - Apache-2.0
   - by Baidu
   - 飞桨 - PADDLE -> PArallel Distributed Deep LEarning
+  - 通用框架，但 Paddle 主要中文 OCR, NLP 做得好
 - 参考
   - https://www.paddlepaddle.org.cn/
   - [PaddlePaddle/PaddleHub](https://github.com/PaddlePaddle/PaddleHub)
@@ -22,6 +23,7 @@ pip install paddlepaddle
 pip install paddlepaddle-gpu
 
 # Docker
+# 百度镜像 registry.baidubce.com/paddlepaddle/paddle:3.0.0b1
 docker run --rm -it -v $PWD:/host --entrypoint /host --name paddle paddlepaddle/paddle /bin/bash
 ```
 
diff --git a/notes/ai/nlp/ocr/ocr-awesome.md b/notes/ai/nlp/ocr/ocr-awesome.md
diff --git a/notes/ai/ocr/README.md b/notes/ai/ocr/README.md
@@ -0,0 +1,47 @@
+---
+tags:
+  - Awesome
+---
+
+# OCR
+
+| abbr. | stand for                              | meaning                 |
+| ----- | -------------------------------------- | ----------------------- |
+| OCR   | Optical Character Recognition          | 光学字符识别            |
+| MFD   | Mathematical Formula Detection         | 数学公式检测            |
+| MFR   | Mathematical Formula Recognition       | 数学公式识别            |
+| CRNN  | Convolutional Recurrent Neural Network | 卷积循环神经网络        |
+| ALPR  | Automatic License Plate Recognition    | 自动车牌识别            |
+| ICR   | Intelligent Character Recognition      | 智能字符识别            |
+| OMR   | Optical Mark Recognition               | 光学标记识别            |
+| MICR  | Magnetic Ink Character Recognition     | 磁性墨水字符识别        |
+| HCR   | Handwritten Character Recognition      | 手写字符识别            |
+| LSTM  | Long Short-Term Memory                 | 长短期记忆网络          |
+| CNN   | Convolutional Neural Network           | 卷积神经网络            |
+| TrOCR | Transformer-based OCR                  | 基于 Transformer 的 OCR |
+| TATR  | Table Transformer                      | 表格转换器              |
+| DETR  | Detection Transformer                  | 检测转换器              |
+
+- 特性
+  - Detection
+    - 文章+Box
+  - OCR
+    - 文字识别
+  - Layout
+    - 区块、标题
+  - Reading Order
+    - 阅读顺序
+  - Table Recognition
+    - 表格识别
+  - 多语言
+- 领域
+  - document OCR
+  - printed text
+  - handwriting
+  - license plates
+  - image
+  - photo
+
+---
+
+- [Awesome](./ocr-awesome.md)
diff --git a/notes/ai/ocr/doclayout-yolo.md b/notes/ai/ocr/doclayout-yolo.md
@@ -0,0 +1,21 @@
+---
+tags:
+  - YOLO
+---
+
+# DocLayout-YOLO
+
+- [opendatalab/DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
+  - AGPLv3, Python, YOLOv10
+  - https://huggingface.co/spaces/opendatalab/DocLayout-YOLO
+- classes
+  - title
+  - plain text
+  - abandon
+  - figure
+  - figure_caption
+  - table
+  - table_caption
+  - table_footnote
+  - isolate_formula
+  - formula_caption
diff --git a/notes/ai/ocr/ocr-awesome.md b/notes/ai/ocr/ocr-awesome.md
@@ -0,0 +1,81 @@
+---
+tags:
+  - Awesome
+---
+
+# OCR Awesome
+
+- All in One OCR/OCR Toolkit
+  - [PaddleOCR](./paddleocr.md)
+    - Paddle
+    - by 百度
+    - https://paddlejs.baidu.com/ocr
+    - [PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
+    - [PaddlePaddle/Paddle.js](https://github.com/PaddlePaddle/Paddle.js)
+      - 很久没维护了
+    - [hiroi-sora/PaddleOCR-json](https://github.com/hiroi-sora/PaddleOCR-json)
+      - 离线，Windows，命令行输出 JSON 结果
+    - [Evezerest/PPOCRLabel](https://github.com/Evezerest/PPOCRLabel)
+      - 半自动化图形标注工具
+    - [PP-OCRv4](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/PP-OCRv4_introduction.md)
+  - [breezedeus/Pix2Text](https://github.com/breezedeus/Pix2Text)
+    - MIT
+    - 国内开发者维护
+    - 简体中文&英文 使用的 CnOCR, 其他使用的 EasyOCR
+    - p2t 命令行 https://pix2text.readthedocs.io/zh-cn/stable/command/
+    - macOS 桌面工具 [breezedeus/Pix2Text-Mac](https://github.com/breezedeus/Pix2Text-Mac)
+  - [RapidAI/RapidOCR](https://github.com/RapidAI/RapidOCR)
+    - Apache-2.0, Python
+    - based on ONNXRuntime, OpenVION, PaddlePaddle
+    - PaddleOCR -> ONNXRuntime
+    - OCR, Layout, Table, Form, Receipt, Invoice
+  - [VikParuchuri/surya](https://github.com/VikParuchuri/surya)
+    - GPLv3, Python
+    - 支持 Detection, OCR, Layout, Reading Order, Table Recognition
+- 基础 OCR/通用 OCR
+  - EasyOCR
+  - tesseract
+    - [naptha/tesseract.js](https://github.com/naptha/tesseract.js)
+      - Apache-2.0, JS
+  - [breezedeus/cnocr](https://github.com/breezedeus/cnocr)
+    - Apache-2.0
+    - 基于 RapidOCR 集成 PPOCRv4 最新版 OCR 模型
+  - [jingsongliujing/OnnxOCR](https://github.com/jingsongliujing/OnnxOCR)
+    - 基于PaddleOCR重构，并且脱离PaddlePaddle深度学习训练框架的轻量级OCR
+- 表格/Table/Layout/文档
+  - [RapidAI/TableStructureRec](https://github.com/RapidAI/TableStructureRec)
+    - 表格识别算法的集合库
+    - wired_table_rec 有线表格识别算法
+    - lineless_table_rec 无线表格识别算法
+  - [RapidAI/RapidTable](https://github.com/RapidAI/RapidTable)
+    - Apache-2.0, Python, ONNX
+    - 源自 PP-Structure 的表格识别算法，模型转换为ONNX，推理引擎采用ONNXRuntime
+  - [opendatalab/DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
+    - AGPLv3, Python, YOLOv10
+    - https://huggingface.co/spaces/opendatalab/DocLayout-YOLO
+  - [AlibabaResearch/AdvancedLiterateMachinery](https://github.com/AlibabaResearch/AdvancedLiterateMachinery)
+    - Apache-2.0, Python, C++
+    - by 阿里巴巴
+- [getomni-ai/zerox](https://github.com/getomni-ai/zerox)
+  - MIT
+  - PDF to Markdown
+  - 使用 OpenAI, Anthropic, AWS Bedrock
+- [katanaml/sparrow](https://github.com/katanaml/sparrow)
+  - GPLv3, Python
+  - Data processing with ML, LLM and Vision LLM
+- [mindee/doctr](https://github.com/mindee/doctr)
+  - Apache-2.0, Python, TensorFlow 2, PyTorch
+- [Walleclipse/ChineseAddress_OCR](https://github.com/Walleclipse/ChineseAddress_OCR)
+- [ooooverflow/chinese-ocr](https://github.com/ooooverflow/chinese-ocr)
+  - CRNN
+- macOS OCR Live Text
+  - 直接 Preview 在图片上识别文字
+- [dynobo/normcap](https://github.com/dynobo/normcap)
+  - OCR powered screen-capture tool
+- [faustomorales/keras-ocr](https://github.com/faustomorales/keras-ocr)
+- [TDiblik/main-gate-alpr](https://github.com/TDiblik/main-gate-alpr)
+  - license plates
+  - https://news.ycombinator.com/item?id=37384327
+- https://github.com/kba/awesome-ocr
+- 商业
+  - https://doc2x.noedgeai.com/
diff --git a/notes/ai/nlp/ocr/paddleocr.md → notes/ai/ocr/paddleocr.md b/notes/ai/nlp/ocr/paddleocr.md → notes/ai/ocr/paddleocr.md
@@ -13,6 +13,14 @@ title: PaddleOCR
   - https://gitee.com/duolabmeng666/paddlehub_ppocr/blob/master/Dockerfile
   - https://gitee.com/paddlepaddle/PaddleOCR/blob/release/2.6/deploy/docker/hubserving/cpu/Dockerfile
 
+```bash
+# registry.baidubce.com/paddlepaddle/paddle:3.0.0b1-jupyter
+# registry.baidubce.com/paddlepaddle/paddle:3.0.0b1
+docker run --rm -it \
+  -v $PWD:/paddle \
+  --name paddle registry.baidubce.com/paddlepaddle/paddle:3.0.0b1 /bin/bash
+```
+
 ```py
 from paddleocr import PaddleOCR, draw_ocr
 
@@ -37,3 +45,9 @@ im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc
 im_show = Image.fromarray(im_show)
 im_show.save('result.jpg')
 ```
+
+## PP-Structure
+
+- PP-Structure 文档分析
+- https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppstructure
+- https://paddlepaddle.github.io/PaddleOCR/latest/ppstructure/overview.html
-Original file line number
+Diff line change
@@ Expand Up / @@ -5,6 +5,7 @@ tags: @@
     # ASR
+    - ASR - Automatic Speech Recognition - 自动语音识别
     - [FunASR](./funasr.md)
     - [Whisper](./whisper.md)
     - Kaldi
@@ Expand Down @@