Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DMP 2024]: Voice API #128

Open
5 tasks
suzinyou opened this issue Mar 15, 2024 · 22 comments
Open
5 tasks

[DMP 2024]: Voice API #128

suzinyou opened this issue Mar 15, 2024 · 22 comments
Assignees
Labels

Comments

@suzinyou
Copy link
Collaborator

suzinyou commented Mar 15, 2024

Ticket Contents

Description

[Provide a brief description of the feature, including why it is needed and what it will accomplish.]
Ask A Question is a free and open-source tool created to help non-profit organizations, governments in developing nations, and social sector organizations utilize Large Language Models for responding to citizen inquiries in their native languages.

Create new voice response API: the API will allow users to send questions and receive responses from AAQ using voice notes. This will increase the accessibility of AAQ to users for whom speaking/listening is easier than writing/reading.

Goals & Mid-Point Milestone

Goals

By mid-point

  • Develop an API endpoint in AAQ for sending queries in text and receiving responses in voice (text-to-speech, TTS). The first iteration may use an external TTS API
  • Develop a TTS service for AAQ using an open-source model that can replace an external TTS API

By project end

  • Develop an API endpoint for sending questions as voice notes and receiving responses as voice notes (speech-to-text, text-to-speech)
  • Integrate the TTS service into AAQ infrastructure on AWS
  • Publish a short blog post on AAQ website about the changes

For every goal listed, there will be a few rounds of design-feedback-implementation with support from the mentors and wider AAQ team.

Setup/Installation

AAQ contribution guide is here: https://idinsight.github.io/aaq-core/develop/contributing/

You will be given access to our testing environment on AWS.

Expected Outcome

  1. AAQ users can query the voice endpoints for voice questions and/or voice response. This can be seamlessly integrated into AAQ’s chat flow manager of choice, Typebot.io.
  2. AAQ users have an option to use an open-source TTS/STT model instead of an external API.

Acceptance Criteria

No response

Implementation Details

You will build the APIs in our core_backend component, which is built in Python, using FastAPI.

Our database is PostgreSQL + pgvector for managing document embeddings (contents) as well as other transactional data.

For the TTS/STT service that serves open-sourced models, you will make it as platform-agnostic as possible, which often means using Docker, but the integration will be to AWS, as our demo environment sits in AWS. You will be able to lead the architecture design for such a service. Of course, our mentors and the wider AAQ team will be available to support and think it through together.

Mockups/Wireframes

No response

Product Name

Ask A Question

Organisation Name

IDinsight

Domain

Open Source Library

Tech Skills Needed

AWS, Database, Python

Mentor(s)

@amiraliemami @lickem22 are Data Scientists at IDinsight!

Category

API, Backend, Database, Delpoyment, AI

@MustafaAkolawala

This comment was marked as off-topic.

@suzinyou

This comment was marked as outdated.

@MustafaAkolawala

This comment was marked as outdated.

@ashuashutosh2211

This comment was marked as off-topic.

@MustafaAkolawala

This comment was marked as outdated.

@kannanb2745

This comment was marked as outdated.

@DhruvLamba

This comment was marked as outdated.

@vivekkumarsoni123

This comment was marked as outdated.

@lickem22

This comment was marked as off-topic.

@lickem22

This comment was marked as outdated.

@lickem22

This comment was marked as outdated.

@AbhimanyuSamagra

This comment was marked as off-topic.

@Sunilstar-V

This comment was marked as outdated.

@LuciferMorningstar33

This comment was marked as outdated.

@nitish1804

This comment was marked as outdated.

@lickem22

This comment was marked as outdated.

@lickem22

This comment was marked as outdated.

@lickem22

This comment was marked as outdated.

@ThunderSmoker

This comment was marked as resolved.

@MustafaAkolawala MustafaAkolawala self-assigned this Jun 20, 2024
@lickem22
Copy link
Contributor

lickem22 commented Jun 20, 2024

Weekly Goals

Week 1

  • Go through the codebase and get yourself familiar with it
  • start first implementation of TTS end point with external API
  • Finish technical design of TTS API

Week 2

  • Finish technical design of STT API endpoint
  • Progress implementation of TTS with external API
  • Start implementation of STT endpoint with external API (edited)

Week 3

  • Progress in implementation of TTS implementation with external API
  • R&D on which inhouse TTS and STT source model to use

Week 4

  • Finish implementation of TTS implementation with external API
  • Start implementation of STT with external API

Week 5

  • Continue implementation of STT with external API
  • Start integrating it to STT with GCP

Week 6

  • Raise PR for STT implementation with external API
  • Research integrating Bhashini

Week 7:

  • Improve GA integration and tests
  • Start working on GCP integration

Week 8:

  • Finish implementation of internal STT
  • Raise PR for GHA workflow and seperate STT tests
  • Raise PR for GCP integration in speech workflow

Week 9

  • Merge STT tests PR
  • Merge GCP integration PR
  • Finish the Speech-to-Speech design
  • Start implementing the Speech-to-Speech workflow.

Week 10

  • Merge implementing the new Speech-to-Speech endpoint design.
  • Finish external TTS and STT implementation
  • Support on Turn.io workflow.

Week 11

  • Merge external TTS and STT implementation
  • Start implementing internal TTS
  • Work on documentation

Week 12

  • Finish implementation and raise PR
  • Finish implementation of docs and raise PR
  • Write blogpost for voice component

@MustafaAkolawala
Copy link
Collaborator

MustafaAkolawala commented Jun 22, 2024

Weekly Learnings & Updates

Week 1

  • Understanding production codebase structure and best practices.
  • Learning about automated workflows and CI/CD processes.
  • Mastering modular and clean code writing techniques.
  • Creating technical designs for system architecture.
  • Developing skills for effective team collaboration and communication in a professional software development environment

Week 2

  • How to use pytests and make my own tests.
  • End-to-End Testing of functionalities

Week 3

  • Learnt how to use Vosk as an external API for STT
  • Learnt how to pre process MP3/WAV files and convert them to mel-spectograms for further processing

Week 4

  • Learned about Hugging Face and how it could effectively assist in integrating Whisper.
  • Researched Bhashini and its potential integration for Indic languages
  • Studied the creation of separate Docker containers and their collaborative use to build a more efficient pipeline. Also learned about multi-stage building

Week 5

  • Learnt how to create my own images and dockerfile and use shared volumes across docker containers to persist data during runtime.
  • Learnt to create FastApi apps and recieving and handling of multipart/form data.

Week 6

  • Learnt how to use Monkeypatching and MagicMock to mock and patch external depdendencies and I/O during writing of unit and functional tests.
  • Studied about integrating GCP cloud buckets for the STT and TTS mp3 file storage.

Week 7

  • Learnt how to make Makefiles to automate the execution of tests.
  • Learnt how to make Github Action Workflows to create my own CI/CD pipeline to execute unit tests whenever a commit is pushed in the speech_api directory.

Week 8

  • Studied about Vertex AI and liteLLM proxy to integrate google STT and TTS external APIs for demo day
  • Gained Insights on UX design of endpoints for speech workflow

Week 9

  • Read google cloud TTS and STT documentation
  • Learnt how to manage different types of media types and convert the same

Week 10

  • researched about chat managers like gliffic, typebot, turn etc
  • learnt how to make chat voiceflows on various types of chatbots to connect to aaq

Week 11

  • Researched about more internal TTS models according to specific usecases
  • Learnt to write documentation using Mkdocs for aaq
  • Learnt to write blogposts

@MadhalasaSJ
Copy link

Hi @amiraliemami, My name is Madhalasa, and I’ve recently completed my B.E in AI & ML from RNSIT, Bangalore. I’ve done an internship at Infosys Springboard an an AI Intern and have skills in database and Python. As a fresher , I'm eager to contribute to this project. Is there a preferred method for communicating with the mentors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests