Skip to content

manicdon7/Web-DeepSearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Web-DeepSearch API

A powerful FastAPI-based web research agent that searches multiple web sources, scrapes content from relevant pages, and synthesizes comprehensive answers using AI. This tool eliminates the need to manually browse through multiple websites by providing synthesized answers from across the web.

πŸš€ Features

  • Multi-Source Research: Searches and scrapes content from multiple web sources simultaneously
  • AI-Powered Synthesis: Uses advanced AI to synthesize information from scraped sources into coherent answers
  • Unlimited Sources: No artificial limits on the number of sites to search - scrapes as many relevant sources as possible
  • Smart Filtering: Automatically filters out low-quality and blocked domains
  • FastAPI Backend: High-performance REST API with automatic documentation
  • Health Monitoring: Built-in /ping endpoint for health checks

πŸ› οΈ Installation & Setup

Prerequisites

  • Python 3.8 or higher
  • pip (Python package manager)

Step 1: Clone the Repository

git clone <repository-url>
cd Web-DeepSearch

Step 2: Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Run the Application

Local Development

# Run with uvicorn for development
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Production with Vercel

This project is configured for Vercel deployment. Simply push to your GitHub repository connected to Vercel.

πŸ“‘ API Endpoints

1. Query Endpoint

POST /query/

Submit a search query and receive a synthesized answer from multiple web sources.

Request Body:

{
  "query": "What are the latest developments in quantum computing?"
}

Response:

{
  "answer": "A comprehensive synthesized answer based on multiple web sources...",
  "sources_used": [
    "https://example.com/article1",
    "https://example.com/article2",
    "https://example.com/article3"
  ]
}

2. Health Check

GET /ping

Simple health check endpoint.

Response:

{
  "status": "ok",
  "message": "pong"
}

3. Root Endpoint

GET /

Welcome message with API information.

Response:

{
  "message": "Welcome to the Multi-Source Research Agent API!"
}

πŸ§ͺ Testing the API

Using curl

# Health check
curl http://localhost:8000/ping

# Query endpoint
curl -X POST "http://localhost:8000/query/" \
     -H "Content-Type: application/json" \
     -d '{"query": "What is machine learning?"}'

Using Python requests

import requests

response = requests.post(
    "http://localhost:8000/query/",
    json={"query": "Latest AI breakthroughs 2024"}
)
print(response.json())

πŸ“ Project Structure

Web-DeepSearch/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py          # FastAPI application and endpoints
β”‚   β”œβ”€β”€ model.py         # Pydantic models for request/response
β”‚   β”œβ”€β”€ search_client.py # Web search and scraping logic
β”‚   β”œβ”€β”€ scraper.py       # Web scraping utilities
β”‚   β”œβ”€β”€ agent.py         # AI synthesis logic
β”‚   └── config.py        # Configuration settings
β”œβ”€β”€ requirements.txt     # Python dependencies
β”œβ”€β”€ vercel.json         # Vercel deployment configuration
β”œβ”€β”€ .gitignore          # Git ignore rules (includes venv/)
└── README.md           # This file

πŸ”§ Configuration

Environment Variables

Create a .env file in the root directory:

# Optional: Add your API keys here if needed
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here

Domain Filtering

The application automatically filters out certain domains (social media, video platforms, etc.) to ensure quality results. You can modify the DOMAIN_BLOCKLIST in app/search_client.py to customize this.

πŸš€ Deployment Options

Vercel (Recommended)

  1. Push your code to GitHub
  2. Import the repository in Vercel
  3. Deploy automatically with zero configuration

Docker

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Heroku

heroku create your-app-name
git push heroku main

πŸ“ Usage Examples

Research Assistant

Perfect for:

  • Academic research across multiple sources
  • Market analysis and competitive intelligence
  • News aggregation and summary
  • Technical documentation synthesis
  • Fact-checking across multiple sources

Integration Examples

// Frontend integration
const researchQuery = async (query) => {
  const response = await fetch('/query/', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ query })
  });
  return response.json();
};

⚠️ Important Notes

  • Rate Limiting: Be mindful of API rate limits when making frequent requests
  • Content Quality: The AI synthesis quality depends on the scraped content quality
  • Network Dependency: Requires active internet connection for web scraping
  • Blocked Domains: Some websites may block scraping - these are automatically skipped

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes and test thoroughly
  4. Submit a pull request with clear description

πŸ“„ License

This project is open source and available under the MIT License.

πŸ”— Links


For support or questions, please open an issue in the GitHub repository.

About

Web Search API for retrieve data from internet.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages