Skip to content

Latest commit

 

History

History
132 lines (100 loc) · 8.82 KB

README.md

File metadata and controls

132 lines (100 loc) · 8.82 KB

Supported functions

Speech recognition Speech synthesis Speaker verification Speaker identification
✔️ ✔️ ✔️ ✔️
Spoken Language identification Audio tagging Voice activity detection
✔️ ✔️ ✔️
Keyword spotting Add punctuation
✔️ ✔️

Supported platforms

Architecture Android iOS Windows macOS linux
x64 ✔️ ✔️ ✔️ ✔️
x86 ✔️ ✔️
arm64 ✔️ ✔️ ✔️ ✔️ ✔️
arm32 ✔️ ✔️
riscv64 ✔️

Supported programming languages

1. C++ 2. C 3. Python 4. C# 5. Java 6. JavaScript
✔️ ✔️ ✔️ ✔️ ✔️ ✔️
7. Kotlin 8. Swift 9. Go 10. Dart 11. Rust
✔️ ✔️ ✔️ ✔️ ✔️

For Rust support, please see https://github.com/thewh1teagle/sherpa-rs

It also supports WebAssembly.

Introduction

This repository supports running the following functions locally

  • Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
  • Text-to-speech (i.e., TTS)
  • Speaker identification
  • Speaker verification
  • Spoken language identification
  • Audio tagging
  • VAD (e.g., silero-vad)
  • Keyword spotting

on the following platforms and operating systems:

with the following APIs

  • C++, C, Python, Go, C#
  • Java, Kotlin, JavaScript
  • Swift
  • Dart

Links for pre-built Android APKs

Description URL 中国用户
Streaming speech recognition Address 点此
Text-to-speech Address 点此
Voice activity detection (VAD) Address 点此
VAD + non-streaming speech recognition Address 点此
Two-pass speech recognition Address 点此
Audio tagging Address 点此
Audio tagging (WearOS) Address 点此
Speaker identification Address 点此
Spoken language identification Address 点此
Keyword spotting Address 点此

Links for pre-built Flutter APPs

Real-time speech recognition

Description URL 中国用户
Streaming speech recognition Address 点此

Text-to-speech

Description URL 中国用户
Android (arm64-v8a, armeabi-v7a, x86_64) Address 点此
Linux (x64) Address 点此
macOS (x64) Address 点此
macOS (arm64) Address 点此
Windows (x64) Address 点此

Note: You need to build from source for iOS.

Links for pre-trained models

Description URL
Speech recognition (speech to text, ASR) Address
Text-to-speech (TTS) Address
VAD Address
Keyword spotting Address
Audio tagging Address
Speaker identification (Speaker ID) Address
Spoken language identification (Language ID) See multi-lingual Whisper ASR models from Speech recognition
Punctuation Address

Useful links

How to reach us

Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.