Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

对于长视频转化出现乱序现象 #38

Open
Vince4644 opened this issue Dec 18, 2024 · 2 comments
Open

对于长视频转化出现乱序现象 #38

Vince4644 opened this issue Dec 18, 2024 · 2 comments
Labels
good first issue Good for newcomers

Comments

@Vince4644
Copy link

screenshot

测试输入:/BV1cvk7YwEzA,视频时长20:28
测试结果:能正常输出.txt文件
问题描述:长视频转化出现乱序,加载whisper模型后,提示【正在转换文本...正在转换第1/27个音频... 25.mp3】,不是从第一个1.mp3开始,导致输出到.txt文件的段落是以【25.mp3】的内容开始。往后不是按照倒序,而是乱序的形式开始转换。

@Vince4644
Copy link
Author

建议修改speech2text.py文件:
import whisper
import os

whisper_model = None

def is_cuda_available():
return whisper.torch.cuda.is_available()

def load_whisper(model="tiny"):
global whisper_model
whisper_model = whisper.load_model(model, device="cuda" if is_cuda_available() else "cpu")
print("Whisper模型:"+model)

def run_analysis(filename, model="tiny", prompt="以下是普通话的句子。"):
global whisper_model
print("正在加载Whisper模型...")

# 读取列表中的音频文件并排序
audio_dir = f"audio/slice/{filename}"
audio_list = os.listdir(audio_dir)

# 过滤出mp3文件并按数字排序
audio_list = [fn for fn in audio_list if fn.endswith('.mp3')]
audio_list.sort(key=lambda x: int(os.path.splitext(x)[0]))

print("加载Whisper模型成功!")

# 创建outputs文件夹
os.makedirs("outputs", exist_ok=True)
print("正在转换文本...")

i = 1
for fn in audio_list:
    print(f"正在转换第{i}/{len(audio_list)}个音频... {fn}")
    # 识别音频
    result = whisper_model.transcribe(os.path.join(audio_dir, fn), initial_prompt=prompt)
    print("".join([segment["text"] for segment in result["segments"] if segment is not None]))

    with open(f"outputs/{filename}.txt", "a", encoding="utf-8") as f:
        f.write("".join([segment["text"] for segment in result["segments"] if segment is not None]))
        f.write("\n")
    i += 1

————————————————————————————————————————————
修改说明:
——过滤和排序音频文件:
audio_list = [fn for fn in audio_list if fn.endswith('.mp3')]
audio_list.sort(key=lambda x: int(os.path.splitext(x)[0]))
过滤:首先过滤出所有以 .mp3 结尾的文件,避免处理非音频文件。
排序:使用 sort 方法,并通过 lambda 函数提取文件名中的数字部分进行排序。os.path.splitext(x)[0] 获取文件名(去除扩展名),然后 int() 转换为整数进行数值排序。

——路径拼接:
result = whisper_model.transcribe(os.path.join(audio_dir, fn), initial_prompt=prompt)
使用 os.path.join 拼接路径,增加代码的可读性和跨平台兼容性。

@lanbinleo lanbinleo added the good first issue Good for newcomers label Dec 20, 2024
@kamenomi-dev
Copy link

hello? 这个代码修改有缩进问题,麻烦改一下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants