A comprehensive document processing and chat system that combines local AI processing with premium cloud TTS for the ultimate document intelligence experience. Upload any document format, extract content with OCR, and have natural conversations powered by local AI models with professional voice synthesis.
- PDF, DOCX, DOC - Word processors and PDFs with full text extraction
- TXT, MD, HTML - Text and markup formats
- CSV, XLSX, XLS - Spreadsheets and data files with table understanding
- JSON, XML - Structured data formats
- RTF - Rich text format support
- Images - JPG, PNG, TIFF, BMP, WebP with advanced OCR
- Local LLM (Ollama) - Private document analysis with tinyllama model
- ChromaDB Vector Store - Advanced semantic search and similarity matching
- Smart Text Processing - Automatic content cleaning and optimization
- Context-Aware Responses - AI maintains conversation history and document understanding
- Complete Document Privacy - All document processing happens locally
- ElevenLabs TTS - Professional, natural-sounding voice synthesis
- Conversational Voices - 5 optimized voices for different contexts:
- Rachel - Warm, natural female voice (perfect for conversations)
- Adam - Deep, engaging male voice (professional content)
- Antoni - Clear, articulate voice (document narration)
- Sam - Casual, friendly voice (relaxed interactions)
- Bella - Expressive, dynamic voice (engaging content)
- Advanced Voice Controls - Stability, similarity boost, expression tuning
- Smart Request Management - Automatic rate limiting and queue handling
- Auto-Play Support - Seamless voice responses for chat messages
- Document-Specific Knowledge - AI understands your document content
- Real-Time Streaming - Fast response generation
- Smart Commands - Extract dates, people, places, action items
- Context Preservation - Maintains conversation flow across sessions
- Multi-Format Understanding - Adapts responses based on document type
- Hierarchical Organization - Folder structure with nested categories
- Advanced Search - Semantic search across all documents
- Tag System - Custom tagging and categorization
- Processing Analytics - Track usage and performance metrics
- Real-Time Status - Live processing updates and health monitoring
- Responsive Design - Optimized for desktop, tablet, and mobile
- Glass Morphism UI - Beautiful, modern interface design
- Dark/Light Themes - Customizable appearance
- Real-Time Updates - Live document processing and chat updates
- Progressive Enhancement - Works offline with cached content
- Next.js 15.1.7 - React framework with App Router and RSC
- TypeScript 5.x - Full type safety and IntelliSense
- Tailwind CSS 3.4.17 - Utility-first styling with custom components
- Framer Motion - Smooth animations and transitions
- Radix UI - Accessible, unstyled component primitives
- Ollama (tinyllama) - Local LLM for document analysis
- LangChain - AI application framework and prompt management
- ChromaDB - Vector database for semantic search
- ElevenLabs API - Premium text-to-speech synthesis
- Pinecone - Cloud vector database for embeddings
- Clerk - User authentication and session management
- Firebase - Document metadata and user data storage
- Hybrid Document Store - In-memory + persistent storage strategy
- Mammoth.js - Advanced DOCX processing and conversion
- PDF-Parse - PDF text extraction and metadata
- Sharp - High-performance image processing
- XLSX - Excel file parsing and data extraction
- Cheerio - HTML parsing and content extraction
node --version # v18+ required
npm --version # v9+ required
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows - Download from: https://ollama.com/download/windows
# Using pip (recommended)
pip install chromadb
# Using conda
conda install -c conda-forge chromadb
# Using Docker
docker run -p 8000:8000 chromadb/chroma
# Clone the repository
git clone https://github.com/your-username/ai-challenge.git
cd ai-challenge
# Install all dependencies
npm install
# Set up environment variables
cp .env.local.example .env.local
# Edit .env.local with your configuration
Create .env.local
with the following configuration:
# Authentication (Required)
CLERK_SECRET_KEY=your_clerk_secret_key
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_publishable_key
# Firebase Configuration (Required)
FIREBASE_PROJECT_ID=your_firebase_project_id
FIREBASE_PRIVATE_KEY="your_firebase_private_key"
FIREBASE_CLIENT_EMAIL=your_firebase_client_email
# AI Configuration (Required)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=tinyllama:latest
# Vector Database (Required)
PINECONE_API_KEY=your_pinecone_api_key
# Premium TTS (Required for voice features)
ELEVENLABS_API_KEY=your_elevenlabs_api_key
# Optional: Payment Integration
STRIPE_SECRET_KEY=your_stripe_secret_key
NEXT_PUBLIC_SCHEMATIC_PUBLISHABLE_KEY=your_schematic_key
Open 3 separate terminals:
# Terminal 1: Start Ollama Service
ollama serve
# Terminal 2: Start ChromaDB Server
chroma run --host localhost --port 8000
# Terminal 3: Start Next.js Application
npm run dev
# Install the recommended model
ollama pull tinyllama:latest
# Verify installation
ollama list
ollama run tinyllama:latest
- Main Application: http://localhost:3001
- Dashboard: http://localhost:3001/dashboard
- System Health: http://localhost:3001/api/system-status
- TTS Test: http://localhost:3001/api/tts
- Navigate to Upload: Go to
/dashboard/upload
- Select Documents: Drag & drop or browse for files
- Choose Options: Enable OCR for scanned documents
- Monitor Progress: Watch real-time processing status
- Verify Completion: Ensure "completed" status before chatting
- Access Document: Go to
/dashboard/files/[document-id]
- Enable Voice: Click the speaker icon for TTS responses
- Natural Conversation: Ask questions in plain English
- Smart Commands:
"Summarize this document" "Extract all dates mentioned" "Find people and organizations" "What are the key action items?" "Compare this with my other documents"
- Enable TTS: Click speaker button in chat interface
- Choose Voice: Select from 5 conversational voices
- Adjust Settings: Fine-tune stability, similarity, expression
- Auto-Play: Enable automatic voice responses
- Advanced Controls: Access streaming and quality settings
- Folders: Create hierarchical folder structures
- Tags: Add custom tags for categorization
- Search: Use semantic search across all documents
- Filtering: Filter by date, type, tags, or content
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Next.js App β β Ollama β β ChromaDB β
β (Frontend) ββββββ (Local AI) β β (Vector Store) β
β Port: 3001 β β Port: 11434 β β Port: 8000 β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Clerk β β ElevenLabs β β Pinecone β
β (Auth) β β (Premium TTS) β β (Embeddings) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Firebase β β Document β β Hybrid Store β
β (Metadata) β β Processing β β (In-Memory) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- Upload β Document parsing and text extraction
- Processing β Content chunking and embedding generation
- Storage β Vector storage in ChromaDB + metadata in Firebase
- Query β User question β Vector similarity search
- AI Response β Context-aware response via Ollama
- TTS β Natural voice synthesis via ElevenLabs
ai-challenge/
βββ π± app/ # Next.js App Router
β βββ api/ # API Routes
β β βββ tts/ # Text-to-speech endpoints
β β βββ pinecone/ # Vector database operations
β β βββ upload-document/ # Document upload handling
β β βββ system-status/ # Health monitoring
β β βββ realtime/ # Real-time updates
β βββ dashboard/ # Main application interface
β β βββ files/[id]/ # Document viewer and chat
β β βββ upload/ # Document upload interface
β β βββ page.tsx # Dashboard home
β βββ globals.css # Global styles
βββ π§© components/ # React Components
β βββ ui/ # Shadcn/ui components
β βββ PDFChatInterface.tsx # Main chat interface
β βββ UniversalDocumentViewer.tsx # Document display
β βββ EnhancedTTSControls.tsx # Voice controls
β βββ PineconeDocumentPage.tsx # Document management
βββ π lib/ # Core Libraries
β βββ hybrid-tts-service.ts # ElevenLabs TTS integration
β βββ elevenlabs-client.ts # ElevenLabs API wrapper
β βββ hybrid-chat-service.ts # AI conversation logic
β βββ hybrid-document-store.ts # Document storage layer
β βββ pinecone-client.ts # Vector database client
β βββ ollama-client.ts # Local AI integration
β βββ pinecone-embeddings.ts # Embedding management
βββ π¨ styles/ # Tailwind CSS
βββ π public/ # Static assets
βββ βοΈ Configuration files # Next.js, TypeScript, etc.
POST /api/upload-document
Content-Type: multipart/form-data
Body: file (any supported format)
Response: { id, fileName, status, processingTime }
GET /api/files/{userId}_{timestamp}_{hash}?includeContent=true
Response: { document, content, metadata }
POST /api/pinecone/embed
Content-Type: application/json
{
"text": "document content",
"documentId": "doc_id",
"fileName": "document.pdf"
}
POST /api/pinecone/chat
Content-Type: application/json
{
"question": "What is this document about?",
"documentId": "doc_id",
"fileName": "document.pdf",
"history": [...]
}
POST /api/tts
Content-Type: application/json
{
"text": "Hello, this is a test",
"voice_id": "EXAVITQu4vr4xnSDxMaL",
"stability": 0.75,
"similarity_boost": 0.85,
"style": 0.2
}
GET /api/tts
Response: {
provider: "elevenlabs",
available: true,
conversationalVoices: {...}
}
GET /api/system-status
Response: {
ollama: { available, models },
chromadb: { available, collections },
elevenlabs: { available, voices, usage }
}
- RAM: 8GB+ recommended (16GB for optimal performance)
- Storage: SSD recommended (faster vector operations)
- CPU: Multi-core processor (AI inference benefits from more cores)
- Network: Stable internet for ElevenLabs TTS (optional for local-only mode)
# Use tinyllama for fastest responses
ollama pull tinyllama:latest
# Monitor model memory usage
ollama ps
# Optimize model parameters
curl http://localhost:11434/api/chat -X POST -d '{
"model": "tinyllama:latest",
"options": {
"temperature": 0.7,
"top_p": 0.9,
"num_predict": 256
}
}'
- Chunk Size: 1000 characters optimal for balance of context and speed
- Overlap: 200 characters for better context preservation
- Batch Processing: Process multiple documents in parallel
- Memory Management: Clear unused embeddings periodically
- Request Queuing: Automatic handling of concurrent request limits
- Voice Caching: Reuse voice settings for consistent performance
- Streaming: Use streaming mode for longer texts
- Rate Limiting: Built-in 100ms delay between requests
- β Local AI Processing - Documents analyzed on your machine
- β Secure Authentication - Clerk-based user management
- β Encrypted Storage - Firebase security rules and encryption
- β API Security - Rate limiting and input validation
- β Data Isolation - User-specific document access controls
- Environment Variables - Secure credential management
- CORS Protection - Restricted cross-origin requests
- Input Sanitization - All user inputs validated and cleaned
- Error Handling - No sensitive information in error messages
- Audit Logging - Track document access and processing
- GDPR Ready - User data deletion and export capabilities
- SOC 2 Compatible - Security controls and monitoring
- Enterprise Security - Role-based access control foundation
# Check Ollama status
ollama list
# Start Ollama if not running
ollama serve
# Verify model installation
ollama pull tinyllama:latest
ollama run tinyllama:latest
# Test API directly
curl http://localhost:11434/api/version
# Check ChromaDB status
curl http://localhost:8000/api/v1/heartbeat
# Start ChromaDB
chroma run --host localhost --port 8000
# Alternative with Docker
docker run -p 8000:8000 chromadb/chroma
# Check for port conflicts
lsof -i :8000
# Check TTS system status
curl http://localhost:3001/api/tts
# System automatically handles:
# - Request queuing (max 2 concurrent)
# - Exponential backoff retry (1s, 2s, 5s)
# - User-friendly error messages
# Verify API key
echo $ELEVENLABS_API_KEY
- β File Format: Ensure supported format (PDF, DOCX, TXT, etc.)
- β File Size: Keep under 25MB for optimal performance
- β Encoding: Use UTF-8 encoding for text files
- β Permissions: Verify file read permissions
- β
Service Health: Check
/api/system-status
# Complete system health check
curl http://localhost:3001/api/system-status | jq
# Test individual components
curl http://localhost:11434/api/version # Ollama
curl http://localhost:8000/api/v1/heartbeat # ChromaDB
curl http://localhost:3001/api/tts # TTS System
# View application logs
npm run dev # Check console output
# Clear application cache
rm -rf .next
npm run dev
# Build for production
npm run build
# Start production server
npm start
# Or deploy to Vercel
npx vercel --prod
# Ollama production setup
ollama serve --host 0.0.0.0 --port 11434
# ChromaDB production setup
chroma run --host 0.0.0.0 --port 8000 --log-level INFO
# Environment variables for production
NODE_ENV=production
OLLAMA_BASE_URL=http://your-ollama-server:11434
# Dockerfile example
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]
- Load Balancing: Multiple Next.js instances behind reverse proxy
- Database Scaling: ChromaDB cluster setup for high availability
- Caching: Redis for session and response caching
- CDN: Static asset optimization and global distribution
- Fork & Clone: Create your own copy of the repository
- Branch: Create feature branch:
git checkout -b feature/amazing-feature
- Develop: Make changes following code style guidelines
- Test: Ensure all functionality works as expected
- Lint: Run
npm run lint
and fix any issues - Commit: Use conventional commits:
git commit -m 'feat: add amazing feature'
- Pull Request: Submit PR with detailed description
- TypeScript: Strict mode enabled, full type coverage
- ESLint: Next.js configuration with custom rules
- Prettier: Consistent code formatting
- Conventional Commits: Standardized commit messages
- Component Structure: Functional components with hooks
- API Design: RESTful endpoints with proper error handling
- Unit Tests: Test utility functions and API endpoints
- Integration Tests: Test document processing workflows
- UI Tests: Verify component functionality across devices
- Performance Tests: Monitor AI response times and memory usage
Format | Extension | Processing | OCR Support |
---|---|---|---|
.pdf |
β Text + Images | β Scanned PDFs | |
Word | .docx , .doc |
β Full formatting | β N/A |
Excel | .xlsx , .xls |
β All sheets | β N/A |
Text | .txt , .md |
β UTF-8 | β N/A |
Web | .html , .xml |
β Clean extraction | β N/A |
Data | .json , .csv |
β Structured parsing | β N/A |
Images | .jpg , .png , .tiff |
β OCR processing | β Text extraction |
Model | Size | Speed | Quality | Memory | Use Case |
---|---|---|---|---|---|
tinyllama:latest |
637MB | β‘ Fast | Good | 2GB | Recommended - Production |
gemma2:2b |
1.6GB | Fast | High | 4GB | Balanced performance |
qwen2.5:3b |
1.9GB | Medium | High | 6GB | Advanced analysis |
llama3.2 |
4.7GB | Medium | Highest | 8GB | Complex reasoning |
Voice | ID | Gender | Style | Best For |
---|---|---|---|---|
Rachel | EXAVITQu4vr4xnSDxMaL |
Female | Conversational | Chat responses |
Adam | pNInz6obpgDQGcFmaJgB |
Male | Professional | Document reading |
Antoni | ErXwobaYiN019PkySvjV |
Male | Clear | Technical content |
Sam | yoZ06aMxZJJ28mfd3POQ |
Male | Casual | Friendly interactions |
Bella | EXAVITQu4vr4xnSDxMaL |
Female | Expressive | Dynamic content |
- ElevenLabs premium TTS integration
- Advanced rate limiting and error handling
- Conversational voice optimization
- Hybrid document storage system
- Real-time processing status updates
- Enhanced UI with glass morphism design
- Smart request queuing for API limits
- Multi-Document Chat - Cross-document conversations and analysis
- Advanced OCR - Table extraction and handwriting recognition
- Voice Input - Speech-to-text for hands-free interaction
- Mobile App - React Native companion application
- API Gateway - RESTful API for third-party integrations
- Collaborative Workspaces - Team document sharing and chat
- Document Comparison - Side-by-side analysis and diff views
- Advanced Analytics - Usage insights and optimization suggestions
- Multi-language Support - UI localization and model support
- Enterprise SSO - SAML and OIDC integration
- Workflow Automation - Zapier and webhook integrations
This project is licensed under the MIT License - see the LICENSE file for complete details.
- Ollama: Apache License 2.0
- ChromaDB: Apache License 2.0
- ElevenLabs: Commercial API service
- Next.js: MIT License
- All npm dependencies: Various open-source licenses
- Ollama - Local AI model inference engine
- ChromaDB - Vector database for semantic search
- ElevenLabs - Premium text-to-speech synthesis
- Next.js - React framework and deployment platform
- Pinecone - Managed vector database service
- Clerk - Authentication and user management
- Firebase - Document metadata storage
- LangChain - AI application framework
- Tailwind CSS - Utility-first CSS framework
- Radix UI - Accessible component primitives
- π Documentation: This README and inline code comments
- π Bug Reports: GitHub Issues
- π‘ Feature Requests: GitHub Discussions
- π§ Technical Support: Check troubleshooting section first
- π€ Pull Requests: Welcome! Please follow contribution guidelines
- π Documentation: Help improve docs and examples
- π§ͺ Testing: Report bugs and help with quality assurance
- π Translation: Assist with internationalization efforts
π Get Started β’ π Documentation β’ π Report Bug β’ π‘ Request Feature
β Star this repo if you find it useful! β