Skip to content

Commit

Permalink
Add qwen-audio style model training: using whisper + qwen2 (#1652)
Browse files Browse the repository at this point in the history
  • Loading branch information
yuekaizhang authored Jun 16, 2024
1 parent 3b40d9b commit 890eeec
Show file tree
Hide file tree
Showing 12 changed files with 2,324 additions and 0 deletions.
20 changes: 20 additions & 0 deletions egs/speech_llm/ASR_LLM/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@

# Introduction

This recipe includes scripts for training [Qwen-Audio](https://github.com/QwenLM/Qwen-Audio/tree/main) style model using multiple datasets.

<br>
<p align="center">
<img src="assets/framework.png" width="800"/>
<p>
<br>

[./RESULTS.md](./RESULTS.md) contains the latest results.

# ASR_LLM

The following table lists the folders for different tasks.

| | Speech Encoder | LLM | Comment |
|---------------------------------------|---------------------|--------------------|---------------------------------------------------|
| [whisper_llm_zh](./whisper_llm_zh) | Whisper | Qwen2 | [Using multiple Chinese datasets](https://github.com/k2-fsa/icefall/tree/master/egs/multi_zh-hans/ASR) |
62 changes: 62 additions & 0 deletions egs/speech_llm/ASR_LLM/RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
## Results

### whisper_llm_zh finetuning results

| Training Dataset | Speech Encoder | LLM | Projector |Comment | CER |
| -------------------------| ----------------|------|--------------------------------------------------|-----|--|
| Aishell1 | whisper-large-v2-aishell1-ft, freeze| Qwen2-1.5B-Instruct, LoRA | Linear, 8x downsample| [yuekai/icefall_asr_aishell_whisper_qwen2_1.5B](https://huggingface.co/yuekai/icefall_asr_aishell_whisper_qwen2_1.5B) | Aishell1 Test 3.62% |
<!-- | Multi-hans-zh | whisper-large-v2-multi-hans-ft, freeze| Qwen2-1.5B-Instruct, LoRA | Linear, 8x downsample| WIP ||
| Multi-hans-zh | whisper-large-v2-multi-hans-ft, freeze| Qwen2-7B-Instruct, LoRA | Linear, 8x downsample| WIP || -->

Command for training is:
```bash
pip install -r whisper_llm_zh/requirements.txt

pip install huggingface_hub['cli']
mkdir -p models/whisper models/qwen

# For aishell fine-tuned whisper model
huggingface-cli download --local-dir models/whisper yuekai/icefall_asr_aishell_whisper exp_large_v2/whisper-large-v2-aishell1-epoch-10-avg-6.pt
# For multi-hans fine-tuned whisper model
# huggingface-cli download --local-dir models/whisper yuekai/icefall_asr_multi-hans-zh_whisper v1.1/whisper-large-v2-multi-hans-zh-epoch-3-avg-10.pt

# huggingface-clie download --local-dir models/qwen Qwen/Qwen2-7B-Instruct
huggingface-clie download --local-dir models/qwen Qwen/Qwen2-1.5B-Instruct

torchrun --nproc_per_node 8 ./whisper_llm_zh/train.py \
--max-duration 200 \
--exp-dir ./whisper_llm_zh/exp_test \
--speech-encoder-path-or-name models/whisper/exp_large_v2/whisper-large-v2-aishell1-epoch-10-avg-6.pt \
--llm-path-or-name Qwen/Qwen2-1.5B-Instruct \
--manifest-dir data/fbank \
--deepspeed \
--deepspeed_config ./whisper_llm_zh/ds_config_zero1.json \
--use-flash-attn True \
--use-lora True --unfreeze-llm True
```

Command for decoding using fine-tuned models:
```bash
mkdir -p models/whisper models/qwen models/checkpoint
huggingface-cli download --local-dir models/checkpoint yuekai/icefall_asr_aishell_whisper_qwen2_1.5B

# For aishell fine-tuned whisper model
huggingface-cli download --local-dir models/whisper yuekai/icefall_asr_aishell_whisper exp_large_v2/whisper-large-v2-aishell1-epoch-10-avg-6.pt
# For multi-hans fine-tuned whisper model
# huggingface-cli download --local-dir models/whisper yuekai/icefall_asr_multi-hans-zh_whisper v1.1/whisper-large-v2-multi-hans-zh-epoch-3-avg-10.pt

huggingface-clie download --local-dir models/qwen Qwen/Qwen2-7B-Instruct

mkdir -p whisper_llm_zh/exp_aishell_whisper_qwen2_1.5B
ln -s models/checkpoint/epoch-10-avg-5.pt whisper_llm_zh/exp_aishell_whisper_qwen2_1.5B/epoch-999.pt

python3 ./whisper_llm_zh/decode.py \
--max-duration 80 \
--exp-dir whisper_llm_zh/exp_aishell_whisper_qwen2_1.5B \
--speech-encoder-path-or-name models/whisper/exp_large_v2/whisper-large-v2-aishell1-epoch-10-avg-6.pt \
--llm-path-or-name models/qwen \
--epoch 999 --avg 1 \
--manifest-dir data/fbank \
--use-flash-attn True \
--use-lora True --dataset aishell
```
Binary file added egs/speech_llm/ASR_LLM/assets/framework.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 46 additions & 0 deletions egs/speech_llm/ASR_LLM/prepare.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/usr/bin/env bash

# fix segmentation fault reported in https://github.com/k2-fsa/icefall/issues/674
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

set -eou pipefail

stage=0
stop_stage=0
# All files generated by this script are saved in "data".
# You can safely remove "data" and rerun this script to regenerate it.
mkdir -p data

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}


if [ $stage -le 0 ] && [ $stop_stage -ge 0 ]; then
log "stage 0: Download whisper-large-v2 aishell 1 fbank feature from huggingface"

# pip install huggingface_hub['cli']
# for aishell 1
huggingface-cli download --local-dir data yuekai/aishell_whisper_fbank_lhotse

fi

if [ $stage -le 1 ] && [ $stop_stage -ge 1 ]; then
log "stage 1: Download whisper-large-v2 multi-hans-zh fbank feature from huggingface"

# for multi-hans-zh
huggingface-cli download --local-dir data/fbank yuekai/wenetspeech_whisper_fbank_lhotse
huggingface-cli download --local-dir data/fbank yuekai/multi_hans_zh_whisper_fbank_lhotse
huggingface-cli download --local-dir data/fbank yuekai/alimeeting_aishell4_training_whisper_fbank_lhotse
fi

if [ $stage -le 2 ] && [ $stop_stage -ge 2 ]; then
log "stage 2: Download whisper-large-v2 speechio test sets fbank feature from huggingface"

# for speechio test sets
mkdir data_speechio
huggingface-cli download --local-dir data_speechio yuekai/icefall_asr_speechio
mv data_speechio/fbank/* data/fbank
fi
1 change: 1 addition & 0 deletions egs/speech_llm/ASR_LLM/whisper_llm_zh/asr_datamodule.py
Loading

0 comments on commit 890eeec

Please sign in to comment.