Supacrawler's ultralight engine for scraping and crawling the web. Written in Go for maximum performance and concurrency. The open-source engine powering Supacrawler.com.
A standalone HTTP service for scraping, mapping, crawling, and screenshots. It runs a web API with a background worker (Redis + Asynq). Routes match the existing Supacrawler SDKs under /v1.
Why open source? We believe powerful web scraping technology should be accessible to everyone. Whether you're a solo developer, startup, or enterprise - you shouldn't have to choose between quality and affordability. Read our open source announcement →
Option A: Docker Compose
curl -O https://raw.githubusercontent.com/supacrawler/supacrawler/main/docker-compose.yml
docker compose upOption B: Manual Docker
docker run -d --name redis -p 6379:6379 redis:7-alpine
docker run --rm -p 8081:8081 \
  -e REDIS_ADDR=host.docker.internal:6379 \
  ghcr.io/supacrawler/supacrawler:latestFor advanced users who prefer native binaries:
- Download from releases page
 - Install dependencies: Redis + Node.js + Playwright v1.49.1
 - Run: 
./supacrawler --redis-addr=127.0.0.1:6379 
Note: Docker is recommended for easier setup. See complete local development guide →
Dependencies:
- Redis - for job queuing and background processing
 - Playwright - for JavaScript rendering and screenshots
 
# 1. Make sure Redis is running
brew services start redis
# OR: docker run -d --name redis -p 6379:6379 redis:7-alpine
# 2. Start Supacrawler
supacrawler --redis-addr=127.0.0.1:6379What you'll see:
🕷️ Supacrawler Engine
├─ Server: http://127.0.0.1:8081
├─ Health: http://127.0.0.1:8081/v1/health  
└─ API Docs: http://127.0.0.1:8081/docs
# Health check
curl http://localhost:8081/v1/health
# Scrape a webpage
curl "http://localhost:8081/v1/scrape?url=https://example.com&format=markdown"
# Take a screenshot
curl -X POST http://localhost:8081/v1/screenshots \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com","full_page":true}'This is supacrawler's core functionality - modern web scraping requires JS rendering.
One-line install handles this automatically. For manual installs:
# Install Node.js and Playwright
npm install -g playwright
playwright install chromium --with-depsWithout Playwright:
- ❌ Screenshots fail completely
 - ❌ SPAs return empty content
 
With Docker: Everything works out of the box (Playwright included).
Learn more about JavaScript rendering →
You can configure Supacrawler using environment variables or a .env file. Copy .env.example to .env and modify as needed.
HTTP_ADDR- Server address (default::8081)REDIS_ADDR- Redis address (default:127.0.0.1:6379)DATA_DIR- Data directory (default:./data)
REDIS_PASSWORD- Redis password (if required)SUPABASE_URL- Supabase project URL (for cloud storage)SUPABASE_SERVICE_KEY- Supabase service keySUPABASE_STORAGE_BUCKET- Storage bucket name (default:screenshots)
New to SupaCrawler? Read our comprehensive development guide → or browse tutorials →
git clone https://github.com/supacrawler/supacrawler.git
cd supacrawler
# Copy environment template
cp .env.example .env
# Edit .env with your configuration
# Set environment variables (or use .env file)
export REDIS_ADDR=127.0.0.1:6379
export HTTP_ADDR=:8081
export DATA_DIR=./data
# Optional: enable Supabase storage upload/sign
export SUPABASE_URL=http://127.0.0.1:64321
export SUPABASE_SERVICE_KEY=<service_key>
export SUPABASE_STORAGE_BUCKET=screenshots
# Ensure Redis is running
brew services start redis
# OR: docker run -d --name redis -p 6379:6379 redis:7-alpine
# Run the server
go mod tidy
go run ./cmd/main.go# Install Air for hot reloading
go install github.com/air-verse/air@latest
# Set environment variables (same as above)
export REDIS_ADDR=127.0.0.1:6379
export HTTP_ADDR=:8081
export DATA_DIR=./data
# Run with hot reload
airFor the best development experience with automatic code reloading:
# Start all services with hot reload enabled
docker compose -f docker-compose.dev.yml up --build
# Or run in detached mode
docker compose -f docker-compose.dev.yml up --build -d
# View logs
docker compose -f docker-compose.dev.yml logs -f supacrawler-dev
# Stop services
docker compose -f docker-compose.dev.yml downWhat you get:
- ✅ Automatic code reloading on file changes (via Air)
 - ✅ Source code mounted as volumes
 - ✅ Redis included and configured
 - ✅ No need to rebuild on code changes
 
How it works: The docker-compose.dev.yml uses Dockerfile.dev which includes Air for hot reloading. Your local source code is mounted into the container, so any changes you make are immediately detected and the server automatically restarts.
For manual Docker builds without hot reload:
# Start Redis
docker run -d --name redis -p 6379:6379 redis:7-alpine
# Build and run scraper
docker build -t supacrawler:dev .
docker run --rm \
  -p 8081:8081 \
  -e REDIS_ADDR=host.docker.internal:6379 \
  -e HTTP_ADDR=":8081" \
  -e DATA_DIR="/app/data" \
  -e SUPABASE_URL="http://host.docker.internal:64321" \
  -e SUPABASE_SERVICE_KEY="<service_key>" \
  -e SUPABASE_STORAGE_BUCKET="screenshots" \
  -v "$(pwd)/data:/app/data" \
  --name supacrawler \
  supacrawler:dev# Docker setup
./scripts/run.sh
# Hot reload setup
./scripts/run.sh --reloadBase URL: http://localhost:8081/v1
Complete API documentation: docs.supacrawler.com
curl -s http://localhost:8081/internal/health# Scrape page (markdown format, links always included)
curl -s "http://localhost:8081/v1/scrape?url=https://supacrawler.com"# Create crawl job
curl -s -X POST http://localhost:8081/v1/crawl \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://supacrawler.com",
    "type": "crawl",
    "format": "markdown",
    "depth": 2,
    "link_limit": 20,
    "include_subdomains": true,
    "include_html": false
  }'
# Get job status
curl -s http://localhost:8081/v1/crawl/<job_id># Create screenshot job
curl -s -X POST http://localhost:8081/v1/screenshots \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://supacrawler.com",
    "full_page": true,
    "format": "png",
    "width": 1366,
    "height": 768
  }'
# Get screenshot
curl -s "http://localhost:8081/v1/screenshots?job_id=<job_id>"
# Synchronous screenshot (stream to file)
curl -s -X POST http://localhost:8081/v1/screenshots \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://supacrawler.com","full_page":true,"format":"png","stream":true}' \
  --output example.png- If 
SUPABASE_URLandSUPABASE_SERVICE_KEYare set, images are uploaded toSUPABASE_STORAGE_BUCKETand a signed URL is returned. - Otherwise, files are written under 
DATA_DIR/screenshotsand served via/files/screenshots/<name>. 
Use the official SDKs to integrate with your applications:
import { SupacrawlerClient } from '@supacrawler/js'
const client = new SupacrawlerClient({ 
  apiKey: 'anything', 
  baseUrl: 'http://localhost:8081/v1' 
})
const result = await client.scrape({ 
  url: 'https://supacrawler.com', 
  format: 'markdown' 
})from supacrawler import SupacrawlerClient
client = SupacrawlerClient(
  api_key='anything', 
  base_url='http://localhost:8081/v1'
)
result = client.scrape({ 
  'url': 'https://supacrawler.com', 
  'format': 'markdown' 
})Tutorials & Guides:
HTTP_ADDR- Server address (default::8081)REDIS_ADDR- Redis address (default:127.0.0.1:6379)REDIS_PASSWORD- Redis password (optional)DATA_DIR- Data directory (default:./data)SUPABASE_URL- Supabase project URL (optional)SUPABASE_SERVICE_KEY- Supabase service key (optional)SUPABASE_STORAGE_BUCKET- Supabase storage bucket name (optional)
We welcome contributions! Please see our development setup above to get started.
- Fork the repository
 - Create a feature branch: 
git checkout -b feature-name - Make your changes and test locally
 - Submit a pull request
 
Community Resources:
- Contributing guidelines
 - Development blog posts with technical deep dives
 - Issue tracker for bugs and features
 - Discussions for questions and ideas
 
Licensed under the Apache License 2.0. See LICENSE for details.