RapidClip is an ongoing project aimed at automating the creation of short videos, ideal for platforms like YouTube Shorts, Instagram Reels, TikTok, and Kwai. The goal is to enable the system to generate complete videos from a provided topic, combining narration, background music, dynamic images, visual effects, and synchronized subtitles.
🇧🇷 For a Portuguese version of this README, see README.pt-br.md.
- Automatic Content Creation: Generate personalized scripts based on the provided topic.
- Audio Narration: Transform the script into high-quality narration.
- Audio Reprocessing: Reprocess audio files that exceed a specified duration to ensure compatibility with platform constraints.
- Subtitle Generation: Generate subtitles with improved alignment and segmentation:
- Tokenizes the transcript text while preserving punctuation.
- Aligns words with their respective timestamps and punctuation.
- Creates readable, synchronized subtitles with character and word limits per line.
- Multi-Language Support: Enable content creation, narration, and subtitles in multiple languages.
- Background Music Integration: Select local soundtracks to enrich the video.
- Relevant Images: Automatically generate images to illustrate the content.
- Visual Effects and Transitions: Apply zoom, animations, and smooth cuts.
- Complete Rendering: Create the final video ready for publication.
Before running RapidClip, make sure to configure the required environment variables. Use the .env.example
file as a template and create a .env
file with the following variables:
OPENAI_API_KEY=your-openai-api-key
ELEVENLABS_API_KEY=your-elevenlabs-api-key
After configuring the environment variables, you can run RapidClip using the following command:
python src/main.py --theme "Curiosities of History (a single curiosity)" --language "pt-BR" --voice_id "CstacWqMhJQlnfLPxRG4" --max_duration 60
--theme
: The theme of the script to be created.--language
: The language for the script and narration.--voice_id
: The ID of the voice to be used for narration (compatible with ElevenLabs).--max_duration
: The maximum allowed duration for the audio (in seconds).
The generated files will be saved in the output/
folder, including:
- An audio file (
.mp3
) containing the narration. - A subtitle file (
.srt
) synchronized with the audio.
The subtitle generation process ensures improved alignment and readability:
- Tokenization with Punctuation: The complete transcript text is tokenized into words and punctuation, preserving the original order.
- Word-Punctuation Alignment: Each word is aligned with its corresponding token, ensuring punctuation is correctly placed.
- Cue Segmentation: Subtitles are divided into smaller segments (cues) based on word and character limits per line, maintaining synchronization with audio timestamps.
RapidClip is in its initial development phase. Features are being implemented and tested to ensure an efficient and intuitive workflow.
- Structure the pipeline for creating scripts, narration, and generating images.
- Implement visual effects and transitions between images.
- Ensure precise synchronization of audio, images, and subtitles.
- Optimize final rendering to ensure compatibility with short video platforms.
- Expand support for audio processing, including reprocessing long audio files and handling user-defined duration limits.
For a Portuguese version of this README, see README.pt-br.md.
If you'd like to collaborate on the project, follow these steps:
- Fork the repository.
- Create a branch for your feature or bug fix:
git checkout -b my-contribution
- Make your changes and submit a pull request detailing your modifications.
We'd love your help to make RapidClip even better!
This project is licensed under the MIT license. This means you are free to use, modify, and distribute it, provided the original license is included in the code. See the LICENSE file for more details.