Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat: add channel check on the audio #626

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MENGZHEGENG
Copy link
Collaborator

PR Goal?

Add check on the number of channels in the provided audio, remind the user when any audio has more than 2 channels (which can be problematic) and exit the program. Otherwise we will keep the only channel or average if there are two channels.

Fixes?

Fixs #351

Feedback sought?

Priority?

Tests added?

TODO

How to test?

Confidence?

Version change?

Related PRs?

Copy link

semanticdiff-com bot commented Jan 20, 2025

Review changes with  SemanticDiff

Changed Files
File Status
  everyvoice/wizard/basic.py  0% smaller

@MENGZHEGENG MENGZHEGENG changed the title feat: add channel check on the audio [WIP] feat: add channel check on the audio Jan 20, 2025
@joanise
Copy link
Member

joanise commented Jan 20, 2025

Question about this test, counting channels: according to https://docs.python.org/3/library/wave.html, wave and wave.open() only support "uncompressed PCM encoded wave files".

@roedoejet Is EveryVoice already only compatible with such wav files, or could using wave.open() break some existing audio format flexibility?

I know that wav2vec2aligner can take .m4a audio files and align them, I've tried that before with success, but I don't know if they're supported anywhere else.

@roedoejet
Copy link
Member

Question about this test, counting channels: according to https://docs.python.org/3/library/wave.html, wave and wave.open() only support "uncompressed PCM encoded wave files".

@roedoejet Is EveryVoice already only compatible with such wav files, or could using wave.open() break some existing audio format flexibility?

I'm not 100% sure, but we use torchaudio.load elsewhere in EveryVoice to load and save audio, so I think we should use that here too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants