An AI avatar system designed for real-time, human-like interactions with unprecedented low latency. Built with optimized inference pipelines and OpenVINO model quantization techniques.
Creating the most human-like AI interaction possible through:
- Ultra-low latency responses
- Real-time multimodal processing
- Optimized inference pipelines
- Natural conversational flow
- Visual Stimuli
-
Interactive AI Avatar
- Real-time facial expressions and animations
- Text-to-Speech with lip-sync
- Natural voice interaction
- Contextual conversation memory
-
Multi-Modal Communication
- Text chat interface
- Voice input/output
- Animated avatar responses
- Real-time WebSocket communication
-
Advanced AI Processing
- Context-aware responses using vector similarity
- Multiple LLM support (OpenAI GPT, Qwen)
- Conversation history tracking
- Dynamic facial expressions and animations
Model Type | Implementation | Optimization | Details |
---|---|---|---|
LLM | Qwen 2.5, Llama 3.1 8B | AMX Build | llama.cpp with AMX support ![]() |
ASR | WhisperCPP | OpenVINO + INT8 | Tiny model optimized for real-time transcription ![]() |
TTS | MeloTTS | OpenVINO | Optimized speech synthesis model ![]() |
Object Detection | YOLO | OpenVINO INT8 | Custom trained model ![]() ![]() |
VAD | Silero | Browser ONNX | Real-time voice activity detection Implementation |
Embeddings | MiniLM-L6-v2 | OpenVINO INT8 | Binary embeddings for conversation context Model |
Lip Sync | Custom Model | OVMS | Optimized for real-time performance ![]() |
Processing Time Improvements:
Short Audio: 1.984s β 0.007s (283x faster)
Medium Audio: 3.280s β 0.009s (364x faster)
Long Audio: 5.550s β 0.013s (427x faster)
Accuracy Maintenance:
Short Audio: 10.1 vs 9.8 mouth cues
Medium Audio: 21.2 vs 20.1 mouth cues
Long Audio: 38.7 vs 38.3 mouth cues
Models deployed on OVMS for efficient serving:
- React + Vite for UI
- Three.js for 3D rendering
- Browser-based VAD using ONNX runtime
- WebSocket for real-time communication
- Custom audio processing pipeline
- Node.js/Express server
- WebSocket server for real-time communication
- MongoDB for conversation persistence
- OpenVINO Model Server (OVMS) deployment
- Custom binary vector storage
- Built with LangChain for structured, maintainable LLM interactions
- Custom prompt templates for consistent avatar behavior
- Function calling for specific avatar expressions and animations
- Chain-based architecture for modular processing
- JSON output parsing with Zod schema validation
- Predefined response structure for:
- Text content
- Facial expressions
- Animation sequences
- Avatar behaviors
- Type-safe responses throughout the system
- Vector similarity search for conversation history
- Binary embeddings for efficient storage and retrieval
- Dynamic context integration in prompts
- Relevance-based response generation
- Clone and Setup
git clone https://github.com/ankithreddypati/Kubo-app.git
# Frontend Setup
cd frontend
npm install
npm start
# Backend Setup
cd backend
npm install
npm run dev
- Environment Configuration
cp .env.local .env
Required variables:
OPENAI_API_KEY=your_key
MONGODB_URI=your_mongodb
BINARY_VECTOR_SERVICE_URL=vector_service_url
MELOTTS_URL=melotts_url
LIPSYNC_MODEL_URL=lipsync_url
MODEL_SERVER_AUTH_TOKEN=auth_token
Client β Server:
// Text Input
{
type: 'chat',
message: 'Hello!'
}
// Voice Input
{
type: 'audio',
transcription: 'Hello!'
}
// Config Update
{
type: 'config',
model: 'qwen',
tts: {
language: 'EN',
accent: 'EN-BR',
speed: 1.2
}
}
Server β Client:
{
type: 'response',
messages: [{
text: 'Hi! How can I help?',
facialExpression: 'smile',
animation: 'Talking1',
audio: 'base64_audio',
lipsync: {
frameData: [...],
timing: [...]
}
}]
}
Current status: Beta
-
Performance Optimization
- Integration with OptILM for smaller LLMs
- Further model quantization
- Enhanced caching strategies
-
Feature Development
- Custom avatar integration system
- Advanced analytics dashboard
- Voice emotion detection
- Multi-language optimization
-
Infrastructure
- Code cleanup and documentation
- Enhanced error handling
- Automated deployment pipeline
- Performance monitoring system
This project was initially developed for a hackathon and is now open for contributions. Key areas:
- code refactoring and testing
- LLM Performance optimization
- Model compression techniques
- Real-time processing improvements
- Documentation
Checkout App at : https://kubocodeserver-ankithreddy137-dev.apps.cluster.intel.sandbox1234.opentlc.com/codeserver/proxy/5174/
Built with OpenVINO optimization for real-time AI interactions
- llama.cpp AMX Build Guide
- Binary Embeddings Background
- VAD Implementation
- OpenVINO Documentation
- MeloTTS OV
- Langchain chathistory