Getting data with Python

A web scraping tutorial

This repository contains materials that I use for teaching basic web scraping and data acquisition topics to non-coding audiences. The core of this workshop is the Getting Data with Python.ipynb notebook. That uses the HTML files stored in wikisource to create the eventual output, all_letters.csv

There is also a bunch of less organized messy stuff in messy folder. Not for the feint of heart, this folder might nonetheless be interesting to someone who wants to mess around with more advanced techniques, like applying Google's natural language cloud processing to this dataset. Basically, this is a lot of unpolished material and some dead ends that may be useful to another person, but is also here so that I don't forget how I did things.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.ipynb_checkpoints		.ipynb_checkpoints
messy_folder		messy_folder
wikisource		wikisource
.gitignore		.gitignore
Getting Data with Python.ipynb		Getting Data with Python.ipynb
README.md		README.md
all_letters.csv		all_letters.csv
stevenson-correspondents.md		stevenson-correspondents.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting data with Python

A web scraping tutorial

About

Releases

Packages

Contributors 2

Languages

jaguillette/stevenson_letters

Folders and files

Latest commit

History

Repository files navigation

Getting data with Python

A web scraping tutorial

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages