🤖 AI-Powered Document Chat System

A comprehensive document processing and chat system that combines local AI processing with premium cloud TTS for the ultimate document intelligence experience. Upload any document format, extract content with OCR, and have natural conversations powered by local AI models with professional voice synthesis.

✨ Key Features

📄 Universal Document Support

PDF, DOCX, DOC - Word processors and PDFs with full text extraction
TXT, MD, HTML - Text and markup formats
CSV, XLSX, XLS - Spreadsheets and data files with table understanding
JSON, XML - Structured data formats
RTF - Rich text format support
Images - JPG, PNG, TIFF, BMP, WebP with advanced OCR

🤖 Hybrid AI Processing

Local LLM (Ollama) - Private document analysis with tinyllama model
ChromaDB Vector Store - Advanced semantic search and similarity matching
Smart Text Processing - Automatic content cleaning and optimization
Context-Aware Responses - AI maintains conversation history and document understanding
Complete Document Privacy - All document processing happens locally

🎵 Premium Voice Experience

ElevenLabs TTS - Professional, natural-sounding voice synthesis
Conversational Voices - 5 optimized voices for different contexts:
- Rachel - Warm, natural female voice (perfect for conversations)
- Adam - Deep, engaging male voice (professional content)
- Antoni - Clear, articulate voice (document narration)
- Sam - Casual, friendly voice (relaxed interactions)
- Bella - Expressive, dynamic voice (engaging content)
Advanced Voice Controls - Stability, similarity boost, expression tuning
Smart Request Management - Automatic rate limiting and queue handling
Auto-Play Support - Seamless voice responses for chat messages

💬 Intelligent Chat System

Document-Specific Knowledge - AI understands your document content
Real-Time Streaming - Fast response generation
Smart Commands - Extract dates, people, places, action items
Context Preservation - Maintains conversation flow across sessions
Multi-Format Understanding - Adapts responses based on document type

🗂️ Professional Document Management

Hierarchical Organization - Folder structure with nested categories
Advanced Search - Semantic search across all documents
Tag System - Custom tagging and categorization
Processing Analytics - Track usage and performance metrics
Real-Time Status - Live processing updates and health monitoring

📱 Modern User Experience

Responsive Design - Optimized for desktop, tablet, and mobile
Glass Morphism UI - Beautiful, modern interface design
Dark/Light Themes - Customizable appearance
Real-Time Updates - Live document processing and chat updates
Progressive Enhancement - Works offline with cached content

🛠️ Technology Stack

Frontend Architecture

Next.js 15.1.7 - React framework with App Router and RSC
TypeScript 5.x - Full type safety and IntelliSense
Tailwind CSS 3.4.17 - Utility-first styling with custom components
Framer Motion - Smooth animations and transitions
Radix UI - Accessible, unstyled component primitives

AI & Processing

Ollama (tinyllama) - Local LLM for document analysis
LangChain - AI application framework and prompt management
ChromaDB - Vector database for semantic search
ElevenLabs API - Premium text-to-speech synthesis
Pinecone - Cloud vector database for embeddings

Authentication & Storage

Clerk - User authentication and session management
Firebase - Document metadata and user data storage
Hybrid Document Store - In-memory + persistent storage strategy

Document Processing

Mammoth.js - Advanced DOCX processing and conversion
PDF-Parse - PDF text extraction and metadata
Sharp - High-performance image processing
XLSX - Excel file parsing and data extraction
Cheerio - HTML parsing and content extraction

🚀 Quick Start Guide

1. Prerequisites & Installation

Node.js Environment

node --version  # v18+ required
npm --version   # v9+ required

Install Ollama (Required for Local AI)

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows - Download from: https://ollama.com/download/windows

Install ChromaDB (Required for Vector Search)

# Using pip (recommended)
pip install chromadb

# Using conda
conda install -c conda-forge chromadb

# Using Docker
docker run -p 8000:8000 chromadb/chroma

2. Project Setup

# Clone the repository
git clone https://github.com/your-username/ai-challenge.git
cd ai-challenge

# Install all dependencies
npm install

# Set up environment variables
cp .env.local.example .env.local
# Edit .env.local with your configuration

3. Environment Configuration

Create .env.local with the following configuration:

# Authentication (Required)
CLERK_SECRET_KEY=your_clerk_secret_key
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_publishable_key

# Firebase Configuration (Required)
FIREBASE_PROJECT_ID=your_firebase_project_id
FIREBASE_PRIVATE_KEY="your_firebase_private_key"
FIREBASE_CLIENT_EMAIL=your_firebase_client_email

# AI Configuration (Required)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=tinyllama:latest

# Vector Database (Required) 
PINECONE_API_KEY=your_pinecone_api_key

# Premium TTS (Required for voice features)
ELEVENLABS_API_KEY=your_elevenlabs_api_key

# Optional: Payment Integration
STRIPE_SECRET_KEY=your_stripe_secret_key
NEXT_PUBLIC_SCHEMATIC_PUBLISHABLE_KEY=your_schematic_key

4. Start All Services

Open 3 separate terminals:

# Terminal 1: Start Ollama Service
ollama serve

# Terminal 2: Start ChromaDB Server
chroma run --host localhost --port 8000

# Terminal 3: Start Next.js Application
npm run dev

5. Install AI Model

# Install the recommended model
ollama pull tinyllama:latest

# Verify installation
ollama list
ollama run tinyllama:latest

6. Access Your Application

Main Application: http://localhost:3001
Dashboard: http://localhost:3001/dashboard
System Health: http://localhost:3001/api/system-status
TTS Test: http://localhost:3001/api/tts

📖 Detailed Usage Guide

Document Upload & Processing

Navigate to Upload: Go to /dashboard/upload
Select Documents: Drag & drop or browse for files
Choose Options: Enable OCR for scanned documents
Monitor Progress: Watch real-time processing status
Verify Completion: Ensure "completed" status before chatting

Intelligent Document Chat

Access Document: Go to /dashboard/files/[document-id]
Enable Voice: Click the speaker icon for TTS responses
Natural Conversation: Ask questions in plain English

Smart Commands:

"Summarize this document"
"Extract all dates mentioned"
"Find people and organizations"
"What are the key action items?"
"Compare this with my other documents"

Voice & TTS Configuration

Enable TTS: Click speaker button in chat interface
Choose Voice: Select from 5 conversational voices
Adjust Settings: Fine-tune stability, similarity, expression
Auto-Play: Enable automatic voice responses
Advanced Controls: Access streaming and quality settings

Document Organization

Folders: Create hierarchical folder structures
Tags: Add custom tags for categorization
Search: Use semantic search across all documents
Filtering: Filter by date, type, tags, or content

🏗️ System Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Next.js App   │    │     Ollama      │    │    ChromaDB     │
│   (Frontend)    │────│  (Local AI)     │    │ (Vector Store)  │
│  Port: 3001     │    │ Port: 11434     │    │  Port: 8000     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│     Clerk       │    │   ElevenLabs    │    │    Pinecone     │
│  (Auth)         │    │ (Premium TTS)   │    │ (Embeddings)    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Firebase      │    │  Document       │    │  Hybrid Store   │
│  (Metadata)     │    │  Processing     │    │  (In-Memory)    │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Data Flow

Upload → Document parsing and text extraction
Processing → Content chunking and embedding generation
Storage → Vector storage in ChromaDB + metadata in Firebase
Query → User question → Vector similarity search
AI Response → Context-aware response via Ollama
TTS → Natural voice synthesis via ElevenLabs

📁 Project Structure

ai-challenge/
├── 📱 app/                          # Next.js App Router
│   ├── api/                         # API Routes
│   │   ├── tts/                     # Text-to-speech endpoints
│   │   ├── pinecone/                # Vector database operations
│   │   ├── upload-document/         # Document upload handling
│   │   ├── system-status/           # Health monitoring
│   │   └── realtime/                # Real-time updates
│   ├── dashboard/                   # Main application interface
│   │   ├── files/[id]/             # Document viewer and chat
│   │   ├── upload/                  # Document upload interface
│   │   └── page.tsx                 # Dashboard home
│   └── globals.css                  # Global styles
├── 🧩 components/                   # React Components
│   ├── ui/                          # Shadcn/ui components
│   ├── PDFChatInterface.tsx         # Main chat interface
│   ├── UniversalDocumentViewer.tsx  # Document display
│   ├── EnhancedTTSControls.tsx      # Voice controls
│   └── PineconeDocumentPage.tsx     # Document management
├── 📚 lib/                          # Core Libraries
│   ├── hybrid-tts-service.ts        # ElevenLabs TTS integration
│   ├── elevenlabs-client.ts         # ElevenLabs API wrapper
│   ├── hybrid-chat-service.ts       # AI conversation logic
│   ├── hybrid-document-store.ts     # Document storage layer
│   ├── pinecone-client.ts           # Vector database client
│   ├── ollama-client.ts             # Local AI integration
│   └── pinecone-embeddings.ts       # Embedding management
├── 🎨 styles/                       # Tailwind CSS
├── 📄 public/                       # Static assets
└── ⚙️ Configuration files            # Next.js, TypeScript, etc.

🔧 API Reference

Document Management

Upload Document

POST /api/upload-document
Content-Type: multipart/form-data

Body: file (any supported format)
Response: { id, fileName, status, processingTime }

Get Document with Content

GET /api/files/{userId}_{timestamp}_{hash}?includeContent=true
Response: { document, content, metadata }

Vector Search & Chat

Generate Embeddings

POST /api/pinecone/embed
Content-Type: application/json

{
  "text": "document content",
  "documentId": "doc_id",
  "fileName": "document.pdf"
}

Chat with Document

POST /api/pinecone/chat
Content-Type: application/json

{
  "question": "What is this document about?",
  "documentId": "doc_id",
  "fileName": "document.pdf",
  "history": [...]
}

Text-to-Speech

Generate Speech

POST /api/tts
Content-Type: application/json

{
  "text": "Hello, this is a test",
  "voice_id": "EXAVITQu4vr4xnSDxMaL",
  "stability": 0.75,
  "similarity_boost": 0.85,
  "style": 0.2
}

Get TTS Status

GET /api/tts
Response: { 
  provider: "elevenlabs",
  available: true,
  conversationalVoices: {...}
}

System Monitoring

System Health Check

GET /api/system-status
Response: {
  ollama: { available, models },
  chromadb: { available, collections },
  elevenlabs: { available, voices, usage }
}

⚡ Performance & Optimization

System Requirements

RAM: 8GB+ recommended (16GB for optimal performance)
Storage: SSD recommended (faster vector operations)
CPU: Multi-core processor (AI inference benefits from more cores)
Network: Stable internet for ElevenLabs TTS (optional for local-only mode)

Performance Optimization Tips

AI Model Performance

# Use tinyllama for fastest responses
ollama pull tinyllama:latest

# Monitor model memory usage
ollama ps

# Optimize model parameters
curl http://localhost:11434/api/chat -X POST -d '{
  "model": "tinyllama:latest",
  "options": {
    "temperature": 0.7,
    "top_p": 0.9,
    "num_predict": 256
  }
}'

Document Processing

Chunk Size: 1000 characters optimal for balance of context and speed
Overlap: 200 characters for better context preservation
Batch Processing: Process multiple documents in parallel
Memory Management: Clear unused embeddings periodically

TTS Optimization

Request Queuing: Automatic handling of concurrent request limits
Voice Caching: Reuse voice settings for consistent performance
Streaming: Use streaming mode for longer texts
Rate Limiting: Built-in 100ms delay between requests

🛡️ Security & Privacy

Privacy-First Architecture

✅ Local AI Processing - Documents analyzed on your machine
✅ Secure Authentication - Clerk-based user management
✅ Encrypted Storage - Firebase security rules and encryption
✅ API Security - Rate limiting and input validation
✅ Data Isolation - User-specific document access controls

Security Features

Environment Variables - Secure credential management
CORS Protection - Restricted cross-origin requests
Input Sanitization - All user inputs validated and cleaned
Error Handling - No sensitive information in error messages
Audit Logging - Track document access and processing

Compliance Considerations

GDPR Ready - User data deletion and export capabilities
SOC 2 Compatible - Security controls and monitoring
Enterprise Security - Role-based access control foundation

🐛 Troubleshooting Guide

Common Issues & Solutions

❌ "Ollama service unavailable"

# Check Ollama status
ollama list

# Start Ollama if not running
ollama serve

# Verify model installation
ollama pull tinyllama:latest
ollama run tinyllama:latest

# Test API directly
curl http://localhost:11434/api/version

❌ "ChromaDB connection failed"

# Check ChromaDB status
curl http://localhost:8000/api/v1/heartbeat

# Start ChromaDB
chroma run --host localhost --port 8000

# Alternative with Docker
docker run -p 8000:8000 chromadb/chroma

# Check for port conflicts
lsof -i :8000

❌ "ElevenLabs rate limit exceeded"

# Check TTS system status
curl http://localhost:3001/api/tts

# System automatically handles:
# - Request queuing (max 2 concurrent)
# - Exponential backoff retry (1s, 2s, 5s)
# - User-friendly error messages

# Verify API key
echo $ELEVENLABS_API_KEY

❌ "Document processing fails"

✅ File Format: Ensure supported format (PDF, DOCX, TXT, etc.)
✅ File Size: Keep under 25MB for optimal performance
✅ Encoding: Use UTF-8 encoding for text files
✅ Permissions: Verify file read permissions
✅ Service Health: Check /api/system-status

Debug Commands

# Complete system health check
curl http://localhost:3001/api/system-status | jq

# Test individual components
curl http://localhost:11434/api/version          # Ollama
curl http://localhost:8000/api/v1/heartbeat     # ChromaDB
curl http://localhost:3001/api/tts              # TTS System

# View application logs
npm run dev  # Check console output

# Clear application cache
rm -rf .next
npm run dev

🚀 Deployment Guide

Production Deployment

Environment Setup

# Build for production
npm run build

# Start production server
npm start

# Or deploy to Vercel
npx vercel --prod

Service Configuration

# Ollama production setup
ollama serve --host 0.0.0.0 --port 11434

# ChromaDB production setup
chroma run --host 0.0.0.0 --port 8000 --log-level INFO

# Environment variables for production
NODE_ENV=production
OLLAMA_BASE_URL=http://your-ollama-server:11434

Docker Deployment

# Dockerfile example
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]

Scaling Considerations

Load Balancing: Multiple Next.js instances behind reverse proxy
Database Scaling: ChromaDB cluster setup for high availability
Caching: Redis for session and response caching
CDN: Static asset optimization and global distribution

🤝 Contributing Guidelines

Development Workflow

Fork & Clone: Create your own copy of the repository
Branch: Create feature branch: git checkout -b feature/amazing-feature
Develop: Make changes following code style guidelines
Test: Ensure all functionality works as expected
Lint: Run npm run lint and fix any issues
Commit: Use conventional commits: git commit -m 'feat: add amazing feature'
Pull Request: Submit PR with detailed description

Code Standards

TypeScript: Strict mode enabled, full type coverage
ESLint: Next.js configuration with custom rules
Prettier: Consistent code formatting
Conventional Commits: Standardized commit messages
Component Structure: Functional components with hooks
API Design: RESTful endpoints with proper error handling

Testing Guidelines

Unit Tests: Test utility functions and API endpoints
Integration Tests: Test document processing workflows
UI Tests: Verify component functionality across devices
Performance Tests: Monitor AI response times and memory usage

📊 Supported Technologies

Document Formats

Format	Extension	Processing	OCR Support
PDF	`.pdf`	✅ Text + Images	✅ Scanned PDFs
Word	`.docx`, `.doc`	✅ Full formatting	❌ N/A
Excel	`.xlsx`, `.xls`	✅ All sheets	❌ N/A
Text	`.txt`, `.md`	✅ UTF-8	❌ N/A
Web	`.html`, `.xml`	✅ Clean extraction	❌ N/A
Data	`.json`, `.csv`	✅ Structured parsing	❌ N/A
Images	`.jpg`, `.png`, `.tiff`	✅ OCR processing	✅ Text extraction

AI Models Supported

Model	Size	Speed	Quality	Memory	Use Case
`tinyllama:latest`	637MB	⚡ Fast	Good	2GB	Recommended - Production
`gemma2:2b`	1.6GB	Fast	High	4GB	Balanced performance
`qwen2.5:3b`	1.9GB	Medium	High	6GB	Advanced analysis
`llama3.2`	4.7GB	Medium	Highest	8GB	Complex reasoning

Voice Options (ElevenLabs)

Voice	ID	Gender	Style	Best For
Rachel	`EXAVITQu4vr4xnSDxMaL`	Female	Conversational	Chat responses
Adam	`pNInz6obpgDQGcFmaJgB`	Male	Professional	Document reading
Antoni	`ErXwobaYiN019PkySvjV`	Male	Clear	Technical content
Sam	`yoZ06aMxZJJ28mfd3POQ`	Male	Casual	Friendly interactions
Bella	`EXAVITQu4vr4xnSDxMaL`	Female	Expressive	Dynamic content

📈 Roadmap & Future Features

Recently Completed ✅

ElevenLabs premium TTS integration
Advanced rate limiting and error handling
Conversational voice optimization
Hybrid document storage system
Real-time processing status updates
Enhanced UI with glass morphism design
Smart request queuing for API limits

In Development 🚧

Multi-Document Chat - Cross-document conversations and analysis
Advanced OCR - Table extraction and handwriting recognition
Voice Input - Speech-to-text for hands-free interaction
Mobile App - React Native companion application
API Gateway - RESTful API for third-party integrations

Planned Features 📋

Collaborative Workspaces - Team document sharing and chat
Document Comparison - Side-by-side analysis and diff views
Advanced Analytics - Usage insights and optimization suggestions
Multi-language Support - UI localization and model support
Enterprise SSO - SAML and OIDC integration
Workflow Automation - Zapier and webhook integrations

📄 License & Legal

This project is licensed under the MIT License - see the LICENSE file for complete details.

Third-Party Licenses

Ollama: Apache License 2.0
ChromaDB: Apache License 2.0
ElevenLabs: Commercial API service
Next.js: MIT License
All npm dependencies: Various open-source licenses

🙏 Acknowledgments & Credits

Core Technologies

Ollama - Local AI model inference engine
ChromaDB - Vector database for semantic search
ElevenLabs - Premium text-to-speech synthesis
Next.js - React framework and deployment platform
Pinecone - Managed vector database service

Development Tools

Clerk - Authentication and user management
Firebase - Document metadata storage
LangChain - AI application framework
Tailwind CSS - Utility-first CSS framework
Radix UI - Accessible component primitives

💬 Support & Community

Getting Help

📚 Documentation: This README and inline code comments
🐛 Bug Reports: GitHub Issues
💡 Feature Requests: GitHub Discussions
🔧 Technical Support: Check troubleshooting section first

Contributing

🤝 Pull Requests: Welcome! Please follow contribution guidelines
📖 Documentation: Help improve docs and examples
🧪 Testing: Report bugs and help with quality assurance
🌍 Translation: Assist with internationalization efforts

🌟 Built for Privacy, Powered by AI, Enhanced with Voice 🌟

Quick Links

🚀 Get Started • 📖 Documentation • 🐛 Report Bug • 💡 Request Feature

Made with ❤️ for document intelligence and natural conversation

⭐ Star this repo if you find it useful! ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
app		app
chroma		chroma
components		components
convex		convex
hooks		hooks
lib		lib
processed		processed
public		public
test/data		test/data
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
HOMEPAGE_UPLOAD_IMPROVEMENTS.md		HOMEPAGE_UPLOAD_IMPROVEMENTS.md
LICENSE		LICENSE
README.md		README.md
THEME_CONSISTENCY_UPDATE.md		THEME_CONSISTENCY_UPDATE.md
WORKFLOW_TEST_RESULTS.md		WORKFLOW_TEST_RESULTS.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
middleware.ts		middleware.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
test-document.md		test-document.md
test-document.txt		test-document.txt
test-sample.txt		test-sample.txt
test-simple.txt		test-simple.txt
test-workflow.sh		test-workflow.sh
tsconfig.json		tsconfig.json

License

kooya3/PDF

Folders and files

Latest commit

History

Repository files navigation

🤖 AI-Powered Document Chat System

✨ Key Features

📄 Universal Document Support

🤖 Hybrid AI Processing

🎵 Premium Voice Experience

💬 Intelligent Chat System

🗂️ Professional Document Management

📱 Modern User Experience

🛠️ Technology Stack

Frontend Architecture

AI & Processing

Authentication & Storage

Document Processing

🚀 Quick Start Guide

1. Prerequisites & Installation

Node.js Environment

Install Ollama (Required for Local AI)

Install ChromaDB (Required for Vector Search)

2. Project Setup

3. Environment Configuration

4. Start All Services

5. Install AI Model

6. Access Your Application

📖 Detailed Usage Guide

Document Upload & Processing

Intelligent Document Chat

Voice & TTS Configuration

Document Organization

🏗️ System Architecture

Data Flow

📁 Project Structure

🔧 API Reference

Document Management

Upload Document

Get Document with Content

Vector Search & Chat

Generate Embeddings

Chat with Document

Text-to-Speech

Generate Speech

Get TTS Status

System Monitoring

System Health Check

⚡ Performance & Optimization

System Requirements

Performance Optimization Tips

AI Model Performance

Document Processing

TTS Optimization

🛡️ Security & Privacy

Privacy-First Architecture

Security Features

Compliance Considerations

🐛 Troubleshooting Guide

Common Issues & Solutions

❌ "Ollama service unavailable"

❌ "ChromaDB connection failed"

❌ "ElevenLabs rate limit exceeded"

❌ "Document processing fails"

Debug Commands

🚀 Deployment Guide

Production Deployment

Environment Setup

Service Configuration

Docker Deployment

Scaling Considerations

🤝 Contributing Guidelines

Development Workflow

Code Standards

Testing Guidelines

📊 Supported Technologies

Document Formats

AI Models Supported

Voice Options (ElevenLabs)

Packages