Scrape the MIT course catalog with using python.
You can start by running quickstart.py and downloading a single catalog page.
import urllib.request
# first page of MIT course catalog
url = 'http://student.mit.edu/catalog/m1a.html'
# Download the file from `url` and save it locally under `file_name`:
urllib.request.urlretrieve(url, 'm1a.html')
You can down all the pages of the catalog with file named courseCatalog.py.
Marge the catalog pages with merge.py.
Before running sensemaking.py, you need to download all the catalog pages and merge them into one file. You can do that by running courseCatalog.py first and then merge.py.
Note: the merged file has over 45,000 lines of html. Parsing will be slow.