Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Audio Transcription Functionality #94

Open
gromdimon opened this issue Jan 19, 2025 · 0 comments
Open

Add Audio Transcription Functionality #94

gromdimon opened this issue Jan 19, 2025 · 0 comments
Assignees
Labels
feature New feature or request
Milestone

Comments

@gromdimon
Copy link
Contributor

gromdimon commented Jan 19, 2025

Is your feature request related to a problem? Please describe.

Nevron currently lacks the ability to process audio files or voice inputs. This limits its usability for scenarios where users may want to provide voice notes, meeting recordings, or podcasts for analysis, memory updates, or decision-making workflows.


Describe the solution you'd like

Add functionality to transcribe audio files into text using a reliable speech-to-text solution. This feature will enable Nevron to process voice-based inputs and use the transcriptions in workflows.

Proposed Implementation Steps:

  1. Audio File Support:

    • Support common audio file formats such as .mp3, .wav, .flac.
  2. Integration with Speech-to-Text Services:

    • Use an external library or API for transcription:
      • Whisper API.
      • AssemblyAI.
    • Allow users to choose the transcription backend through settings.py.
  3. Integration with Workflows:

    • Add new Execution tool
  4. Error Handling:

    • Handle errors such as:
      • Unsupported file formats.
      • Poor audio quality leading to incomplete transcriptions.
      • API errors during transcription.
    • Log detailed error messages for debugging.
  5. Configuration Options:

    • Add configuration options to settings.py, including:
      • Maximum file size.
      • Transcription backend and API keys.
      • Language settings for transcription.
  6. Unit Tests:

    • Write unit tests to validate audio transcription functionality using sample audio files:
      • Clear audio with text output verification.
      • Poor quality audio with expected errors.
      • Unsupported file formats.

Additional Context

  • We need to first check if audio is secure (doesn't have any malware)
@gromdimon gromdimon added the feature New feature or request label Jan 19, 2025
@gromdimon gromdimon added this to the v0.3.0 milestone Jan 19, 2025
@gromdimon gromdimon modified the milestones: v0.3.0, v0.2.2 Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants