Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 1.52 KB

readme.md

File metadata and controls

15 lines (11 loc) · 1.52 KB

Using the web inspector for complex scrapes

This repo contains example scripts of scrapes in both Ruby and Python using concepts taught in the NICAR 2015 advanced web scraping course. The class focuses on using the web inspector to find the information needed to conduct more sophisticated scrapes. The slide deck for the presentation can be found here.

Requirements

###Python The Python scrapes require only two modules not included with Python standard library. BeautifulSoup4 is a module for parsing markdown languages such as HTML and XML. Requests is used to make both get and post web requests. Both can be installed individually using pip or together using pip install -r requirements.txt.

###Ruby The Ruby scripts require three different libraries. The first is Nokogiri, Ruby's parser for HTML and XML. The ASP.NET scrape requires Mechanize to emulate a browser. Rest-Client is needed to make web requests in the mapscrape.rb example. If you have Bundler installed you can simply navigated to the Ruby directory and use bundle install to install the required libraries. Otherwise, use gem install <package name>.