Cohere - Event Discovery Platform (Scraper)

Overview

This is the event scraping service for the Cohere platform. It uses intelligent scraping techniques with LLMs to extract event information from various online sources and store them in a structured format.

Features

Automated event scraping from configured sources
LLM-powered content extraction
Intelligent deduplication
Scheduled scraping with configurable intervals
Admin API for managing scraping sources
Detailed logging and monitoring

Tech Stack

Language: Python 3.11+
Framework: FastAPI
Scraping: Playwright
LLM Integration: LangChain
Database: Supabase
Task Scheduling: Schedule

Getting Started

Prerequisites

Python 3.11 or higher
Virtual environment tool (venv)
Playwright browser dependencies

Installation

Clone the repository:

git clone https://github.com/your-username/cohere.git
cd cohere-scraper

Create and activate virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Install Playwright browsers:
```
playwright install
```
Create a .env file:
```
cp .env.example .env
```
Update the environment variables with your credentials.
Start the service:
```
uvicorn src.main:app --reload
```

Project Structure

src/
├── scrapers/      # Scraping implementations
├── models/        # Data models and schemas
├── services/      # Business logic and external services
├── utils/         # Utility functions
└── main.py        # Application entry point

tests/             # Test files

Available Commands

Start development server: uvicorn src.main:app --reload
Run tests: pytest
Run linting: flake8
Run type checking: mypy .
Format code: black . && isort .

API Documentation

When the service is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.concrete		.concrete
cohere-frontend		cohere-frontend
cohere-scraper		cohere-scraper
node_modules		node_modules
src		src
supabase/migrations		supabase/migrations
.cursorrules		.cursorrules
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cohere - Event Discovery Platform (Scraper)

Overview

Features

Tech Stack

Getting Started

Prerequisites

Installation

Project Structure

Available Commands

API Documentation

Contributing

License

About

Releases

Packages

Languages

Woven-Web/cohere

Folders and files

Latest commit

History

Repository files navigation

Cohere - Event Discovery Platform (Scraper)

Overview

Features

Tech Stack

Getting Started

Prerequisites

Installation

Project Structure

Available Commands

API Documentation

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages