An AI-powered reading practice application that helps students improve their reading fluency and pronunciation through speech recognition and feedback.
- Story Generation: AI-generated age-appropriate stories based on custom topics
- Speech-to-Text Assessment: Record your reading and get detailed feedback
- Text-to-Speech: Listen to proper pronunciation (when available)
- Reading Analysis: Compare your reading with the original text and get specific improvement suggestions
This application demonstrates an "agentic" approach by coordinating multiple specialized AI components:
- The Storyteller Agent (LLM): Uses Google's Gemini model to create unique, grade-appropriate reading passages
- The Orator Agent (Text-to-Speech): Converts text to natural-sounding speech using Facebook's MMS-TTS
- The Scribe Agent (Speech-to-Text): Uses OpenAI's Whisper model to transcribe spoken words
- The Analyst Agent: Compares original text with your reading to provide detailed feedback
- The Conductor: The Gradio interface orchestrates all components
-
Clone the repository:
git clone https://github.com/yourusername/reading-practice-buddy.git cd reading-practice-buddy
-
Create a virtual environment:
python -m venv read_along source read_along/bin/activate # On macOS/Linux
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create a
.env
file and add your Google API key:GOOGLE_API_KEY=your_google_api_key_here
-
Run the application:
python app1.py
-
Open your browser and go to the URL shown in the terminal (typically
http://127.0.0.1:7860
)
- Story Setup: Enter a name (optional), select grade level, and provide a story topic
- Generate Story: Click "Generate Story Text" to create a custom reading passage
- Listen (Optional): Generate audio to hear proper pronunciation
- Record: Use the microphone to record yourself reading the passage
- Assessment: Get detailed feedback on your reading accuracy
- Python 3.8+
- Google Generative AI API key
- Internet connection for AI services
- Gradio: Web interface framework
- Google Gemini: Story generation
- OpenAI Whisper: Speech-to-text transcription
- Facebook MMS-TTS: Text-to-speech synthesis
- Python difflib: Text comparison and analysis
This project was created as a demonstration for an AI hackathon. Feel free to fork and improve!
MIT License - feel free to use and modify as needed.