Skip to content

A web-scraping framework written in Javascript, using PhantomJS and jQuery

License

Notifications You must be signed in to change notification settings

alfetopito/pjscrape

This branch is 1 commit ahead of, 27 commits behind nrabinowitz/pjscrape:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b7e72d9 · Dec 6, 2011

History

48 Commits
Sep 28, 2011
Dec 6, 2011
Jul 7, 2011
Oct 24, 2011
Jul 11, 2011
Jun 29, 2011
Sep 28, 2011
Oct 19, 2011
Oct 24, 2011

Repository files navigation

Homepage: http://nrabinowitz.github.com/pjscrape/

Overview

pjscrape is a framework for anyone who's ever wanted a command-line tool for web scraping using Javascript and jQuery. Built to run with PhantomJS, it allows you to scrape pages in a fully rendered, Javascript-enabled context from the command line, no browser required.

Dependencies

Features

  • Client-side, Javascript-based scraping environment with full access to jQuery functions
  • Easy, flexible syntax for setting up one or more scrapers
  • Recursive/crawl scraping
  • Delay scrape until a "ready" condition occurs
  • Load your own scripts on the page before scraping
  • Modular architecture for logging and writing/formatting scraped items
  • Client-side utilities for common tasks
  • Growing set of unit tests

Please see http://nrabinowitz.github.com/pjscrape/ for usage, examples, and documentation.

Comments and questions welcomed at nick (at) nickrabinowitz (dot) com.

About

A web-scraping framework written in Javascript, using PhantomJS and jQuery

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 88.4%
  • Python 11.3%
  • Shell 0.3%