A powerful FastAPI-based web research agent that searches multiple web sources, scrapes content from relevant pages, and synthesizes comprehensive answers using AI. This tool eliminates the need to manually browse through multiple websites by providing synthesized answers from across the web.
- Multi-Source Research: Searches and scrapes content from multiple web sources simultaneously
- AI-Powered Synthesis: Uses advanced AI to synthesize information from scraped sources into coherent answers
- Unlimited Sources: No artificial limits on the number of sites to search - scrapes as many relevant sources as possible
- Smart Filtering: Automatically filters out low-quality and blocked domains
- FastAPI Backend: High-performance REST API with automatic documentation
- Health Monitoring: Built-in
/ping
endpoint for health checks
- Python 3.8 or higher
- pip (Python package manager)
git clone <repository-url>
cd Web-DeepSearch
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
pip install -r requirements.txt
# Run with uvicorn for development
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
This project is configured for Vercel deployment. Simply push to your GitHub repository connected to Vercel.
POST /query/
Submit a search query and receive a synthesized answer from multiple web sources.
Request Body:
{
"query": "What are the latest developments in quantum computing?"
}
Response:
{
"answer": "A comprehensive synthesized answer based on multiple web sources...",
"sources_used": [
"https://example.com/article1",
"https://example.com/article2",
"https://example.com/article3"
]
}
GET /ping
Simple health check endpoint.
Response:
{
"status": "ok",
"message": "pong"
}
GET /
Welcome message with API information.
Response:
{
"message": "Welcome to the Multi-Source Research Agent API!"
}
# Health check
curl http://localhost:8000/ping
# Query endpoint
curl -X POST "http://localhost:8000/query/" \
-H "Content-Type: application/json" \
-d '{"query": "What is machine learning?"}'
import requests
response = requests.post(
"http://localhost:8000/query/",
json={"query": "Latest AI breakthroughs 2024"}
)
print(response.json())
Web-DeepSearch/
βββ app/
β βββ __init__.py
β βββ main.py # FastAPI application and endpoints
β βββ model.py # Pydantic models for request/response
β βββ search_client.py # Web search and scraping logic
β βββ scraper.py # Web scraping utilities
β βββ agent.py # AI synthesis logic
β βββ config.py # Configuration settings
βββ requirements.txt # Python dependencies
βββ vercel.json # Vercel deployment configuration
βββ .gitignore # Git ignore rules (includes venv/)
βββ README.md # This file
Create a .env
file in the root directory:
# Optional: Add your API keys here if needed
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
The application automatically filters out certain domains (social media, video platforms, etc.) to ensure quality results. You can modify the DOMAIN_BLOCKLIST
in app/search_client.py
to customize this.
- Push your code to GitHub
- Import the repository in Vercel
- Deploy automatically with zero configuration
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
heroku create your-app-name
git push heroku main
Perfect for:
- Academic research across multiple sources
- Market analysis and competitive intelligence
- News aggregation and summary
- Technical documentation synthesis
- Fact-checking across multiple sources
// Frontend integration
const researchQuery = async (query) => {
const response = await fetch('/query/', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query })
});
return response.json();
};
- Rate Limiting: Be mindful of API rate limits when making frequent requests
- Content Quality: The AI synthesis quality depends on the scraped content quality
- Network Dependency: Requires active internet connection for web scraping
- Blocked Domains: Some websites may block scraping - these are automatically skipped
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Make your changes and test thoroughly
- Submit a pull request with clear description
This project is open source and available under the MIT License.
For support or questions, please open an issue in the GitHub repository.