Skip to content

Latest commit

 

History

History
87 lines (49 loc) · 2.01 KB

README.md

File metadata and controls

87 lines (49 loc) · 2.01 KB

Node.js - scrap

A simple screen scraper module that uses jQuery style semantics.

Why?

In every screen scraper program that I wrote, I had to include request and cheerio. I would then have to check the response error object and the response code. It became a bit annoying. Hence this package.

Installation

npm install scrap

Quick and Dirty

var scrap = require('scrap');

scrap('http://google.com', function(err, $) {
  console.log($('title').text().trim()); //Google
});

API

scrap(options, callback)

options: Can either be a string url or an object containing options as key,value pair.

Options include:

  • url: The url to parse.
  • timeout: The number of milliseconds to wait before aborting the request.
  • proxy: The proxy string e.g. 245.12.19.145:8080.

callback: The callback function for a response. The function can include the following parameters:

  • err: The error object if it exists. If the response code is not 200 this will be set. This may be a poor design choice, time will tell.
  • $: jQuery object to use on the page.
  • code: HTTP response status code.
  • html: HTML or response body text.
  • resp: The actual response object.

Credits

This would not be possible without the great Node.js modules:

Author

This module was written by JP Richardson. You should follow him on Twitter @jprichardson. Also read his coding blog Procbits. If you write software with others, you should checkout Gitpilot to make collaboration with Git simple.

License

(MIT License)

Copyright 2012, JP Richardson [email protected]