This repository contains materials that I use for teaching basic web scraping
and data acquisition topics to non-coding audiences. The core of this workshop
is the Getting Data with Python.ipynb
notebook. That uses the HTML files
stored in wikisource
to create the eventual output, all_letters.csv
There is also a bunch of less organized messy stuff in messy folder
. Not for
the feint of heart, this folder might nonetheless be interesting to someone who
wants to mess around with more advanced techniques, like applying Google's
natural language cloud processing to this dataset. Basically, this is a lot of
unpolished material and some dead ends that may be useful to another person,
but is also here so that I don't forget how I did things.