Skip to content

firecrawl/firecrawl-migrator

Repository files navigation

Firecrawl Content Migrator

A powerful web scraping and content migration tool built with Next.js, TypeScript, and the Firecrawl API. Extract structured data from any website and export it in formats ready for your CMS, database, or data pipeline.

Repository: github.com/mendableai/firecrawl-migrator

What it does

Scrapes content from websites and exports structured data. Map a site, select URLs, define what data to extract, and export as CSV.

Key Features

  • Map website structure to discover all pages
  • Select specific URLs to scrape
  • Define custom fields (title, date, content, etc.)
  • Export data as CSV
  • Batch process multiple pages at once

Use Cases

  • Blog Migration: Extract posts, metadata, and content from any blog platform
  • E-commerce Data: Scrape product information, prices, and descriptions
  • News Archives: Collect articles with dates, authors, and categories
  • Documentation Sites: Extract technical documentation with proper structure
  • Content Audits: Analyze and export existing website content

Prerequisites

  • Node.js 18+ and npm
  • Firecrawl API key (get one at firecrawl.dev)

Quick Start

1. Clone the repository

git clone https://github.com/mendableai/firecrawl-migrator.git
cd firecrawl-migrator

2. Install dependencies

npm install

3. Get your Firecrawl API key

  • Sign up at firecrawl.dev
  • Navigate to your dashboard
  • Copy your API key

4. Configure environment variables

Create a .env.local file in the root directory:

touch .env.local

Add your Firecrawl API key to the file:

FIRECRAWL_API_KEY=fc-YOUR_ACTUAL_API_KEY_HERE

5. Run the development server

npm run dev

6. Open the application

Visit http://localhost:3000 in your browser

You should see the Firecrawl Content Migrator interface. If you see an API key error, double-check your .env.local file.

How It Works

Step 1: Map Website Structure

Enter a URL to analyze the website's structure. The mapping operation discovers all available pages and organizes them in a hierarchical tree view.

Step 2: Select Content

Browse the interactive tree and select which pages to scrape. Use the built-in filters to:

  • Select all pages in a directory
  • Filter by URL patterns
  • Exclude categories, tags, or pagination

Step 3: Define Schema

Configure what data to extract from each page:

  • Default fields: title, date, content
  • Add custom fields: author, category, price, tags, etc.
  • Auto-detection: The tool can analyze pages and suggest fields

Step 4: Extract & Export

Start the batch scraping process to extract structured data from all selected pages. Export results as:

  • CSV for spreadsheets and databases
  • JSON for APIs and applications
  • Custom formats for specific CMS platforms

Troubleshooting

If you see "Firecrawl API key not configured":

  • Make sure you created the .env.local file
  • Check that your API key starts with fc-
  • Restart the development server

Development

Build for production

npm run build
npm start

Run linting

npm run lint

Contributing

Just fork, make your changes, and submit a PR!

License

MIT License - see LICENSE file for details

Support

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published