Text Preprocessing Tools for Subtitles and Speech Synthesis

このリポジトリは、テキストに含まれるルビ表現（《ふりがな》）を処理するための Python スクリプト2本を収録しています。
字幕用途および音声読み上げ用途それぞれに適した前処理を行います。

📦 セットアップ事前に Python 3.7 以上が必要です。依存関係は特にありません（re モジュールのみ使用）。

python remove_annotation_word.py --input sample.txt --output sample_subtitle.txt
python remove_kanji_before_ruby.py --input sample.txt --output sample_voicevox.txt

📄 プロジェクト構成例

.
├── LICENSE
├── README.md (いまここ)
├── remove_ananotation_words.py
├── remove_kanji_before_ruby.py
├── requirements.txt
├── sample_subtitle.txt
├── sample_voicevox.txt
└── sample.txt

📄 スクリプト一覧

1. `remove_annotation_word.py` - 字幕用（ルビ付き単語を削除）

用途: 字幕テキストとして表示する際、視認性を重視し、ルビ全体を削除します。
変換例:

お世嗣《よつぎ》と皇后《こうごう》は城へ向かった。
→ お世嗣と皇后は城へ向かった。

▶️ 使い方

python remove_annotation_word.py --input input.txt --output output.txt

引数説明

--input 入力テキストファイル（UTF-8）
--output 出力テキストファイル

2. remove_kanji_before_ruby.py - 音声読み上げ用

用途: テキスト音声合成で正しい発音を得るために、ルビの中身だけを残します。漢字は読み上げ時の誤読を防ぐため削除します。
変換例:

お世嗣《よつぎ》と皇后《こうごう》は城へ向かった。
→ およつぎとこうごうは城へ向かった。

▶️ 使い方

python remove_kanji_before_ruby.py --input input.txt --output output.txt

引数説明

--input 入力テキストファイル（UTF-8）
--output 出力テキストファイル

📜 ライセンスこのプロジェクトは MITライセンスの下で公開されています。

🙋‍♂️ 作者ご質問・ご要望は Issues または GitHub Discussions からどうぞ。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text Preprocessing Tools for Subtitles and Speech Synthesis

📄 プロジェクト構成例

📄 スクリプト一覧

1. `remove_annotation_word.py` - 字幕用（ルビ付き単語を削除）

▶️ 使い方

引数説明

2. remove_kanji_before_ruby.py - 音声読み上げ用

▶️ 使い方

引数説明

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
remove_ananotation_words.py		remove_ananotation_words.py
remove_kanji_before_ruby.py		remove_kanji_before_ruby.py
requirements.txt		requirements.txt
sample.txt		sample.txt
sample_subtitle.txt		sample_subtitle.txt
sample_voicevox.txt		sample_voicevox.txt

License

ty70/text_preprocessing_tools

Folders and files

Latest commit

History

Repository files navigation

Text Preprocessing Tools for Subtitles and Speech Synthesis

📄 プロジェクト構成例

📄 スクリプト一覧

1. remove_annotation_word.py - 字幕用（ルビ付き単語を削除）

▶️ 使い方

引数 説明

2. remove_kanji_before_ruby.py - 音声読み上げ用

▶️ 使い方

引数 説明

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

1. `remove_annotation_word.py` - 字幕用（ルビ付き単語を削除）

引数説明

引数説明

Packages