The application processes audio and video files to extract transcribed text, saving it locally. Text data is not converted to vector embeddings or stored in a vector database. Users have the option to summarize or translate the extracted text, which is processed by the model to generate the output.
The project involved the following subtasks:
- Task 1: Extracting text from audio and video files.
- Task 2: Creating a vector DB for storing embeddings(chromaDB).
- Task 3: Developing a summarizer and translator using open-source Large Language Models (LLMs).