Skip to content

Woven-Web/cohere

Repository files navigation

Cohere - Event Discovery Platform (Scraper)

Overview

This is the event scraping service for the Cohere platform. It uses intelligent scraping techniques with LLMs to extract event information from various online sources and store them in a structured format.

Features

  • Automated event scraping from configured sources
  • LLM-powered content extraction
  • Intelligent deduplication
  • Scheduled scraping with configurable intervals
  • Admin API for managing scraping sources
  • Detailed logging and monitoring

Tech Stack

  • Language: Python 3.11+
  • Framework: FastAPI
  • Scraping: Playwright
  • LLM Integration: LangChain
  • Database: Supabase
  • Task Scheduling: Schedule

Getting Started

Prerequisites

  • Python 3.11 or higher
  • Virtual environment tool (venv)
  • Playwright browser dependencies

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/cohere.git
    cd cohere-scraper
  2. Create and activate virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Install Playwright browsers:

    playwright install
  5. Create a .env file:

    cp .env.example .env

    Update the environment variables with your credentials.

  6. Start the service:

    uvicorn src.main:app --reload

Project Structure

src/
├── scrapers/      # Scraping implementations
├── models/        # Data models and schemas
├── services/      # Business logic and external services
├── utils/         # Utility functions
└── main.py        # Application entry point

tests/             # Test files

Available Commands

  • Start development server: uvicorn src.main:app --reload
  • Run tests: pytest
  • Run linting: flake8
  • Run type checking: mypy .
  • Format code: black . && isort .

API Documentation

When the service is running, visit:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published