-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute Requirements and Execution Time #61
Comments
Your file is likely too big, on 24GB vram best I could do was 30-second chunks. Try this script as your orchestrater and start with 10 second chunks: `#!/bin/bash set -e # Exit immediately if a command exits with a non-zero status. Input file and output directoriesinput_file="/workspace/2band.wav" Create output directoriesmkdir -p "$output_dir" "$final_output_dir" Set chunk size to 30 secondschunk_size=30 echo "Processing audio in $chunk_size-second chunks" Get the total duration of the input fileduration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$input_file") Process audio in chunksfor i in
done echo "All chunks processed. Concatenating final output..." Prepare a list of processed filesprocessed_files=$(ls "$final_output_dir"/chunk_*_processed.wav | sort -V | sed 's/^/file /') Concatenate processed chunksffmpeg -f concat -safe 0 -i "$final_output_dir/file_list.txt" -c copy "$final_output_dir/output_upscaled.wav" Check if the final output file was createdif [ ! -f "$final_output_dir/output_upscaled.wav" ]; then Clean uprm "$final_output_dir"/chunk_*_processed.wav "$final_output_dir/file_list.txt" echo "Processing complete. Final output file: $final_output_dir/output_upscaled.wav"` |
If segmented and then synthesized, will it affect the quality of the audio? |
I have no idea but intuitively I do not think so. But you can experiment and find out. |
I didn't get a chance to try this back, but based on my experiments with other such models, it does affect the overall tone / and even the quality sometimes too. |
Yes,I tested it, and the result is that it changes the timbre of the voice, making the sound deeper. |
An intriguing observation: At what length does this phenomenon begin to manifest? Does it occur in clips under 15 seconds, or perhaps around 30 seconds? If this issue is widely experienced, there must be a specific threshold where it becomes apparent, suggesting that tools might be ineffective for clips below that duration. Applio, for instance, seems to utilise a similar tool with extensive chunking. If memory serves, the tool on Replicate also employs chunking. Most tools performing tasks like audio super-resolution, particularly those grounded in deep learning, rely on chunking to process audio in segments. This approach is necessary because processing entire long audio clips in one go can be computationally prohibitive or may result in degradation due to the challenges of maintaining coherence over extended durations. |
My conclusion is that no matter the length of the audio, this issue will occur. |
No chunks, gpu fails, out of memory. mp3 2:04, 8kb/s, 22khz. /1.6 mb total |
T4 GPU, same file, 30sec chunk (~300Kb) - 12gb gpu ram |
I've created a script in Colab. You can try this lib there. As of 2024 the lib is only mono btw. |
Failed utterly to run it on 16 GB RAM. Can anyone add the compute requirements and the time it took to execute AudioSR? Looking for a benchmark across GPU vs CPU vs Audio Duration.
The text was updated successfully, but these errors were encountered: