Skip to content

codeforboston/cornerwise-scrapers

Repository files navigation

Cornerwise Scrapers

Introduction

Implementation of the “standard” Cornerwise scrapers using AWS Lambda infrastructure.

Setup

  • Install Node.js, preferably using NVM.
  • npm install serverless Installs Serverless, a command line utility that simplifies the deployment of services that run on AWS Lambda, Azure Functions, or others.
  • npm install --save serverless-python-requirements Installs a Serverless plugin that will download the PIP requirements specified in requirements.txt before deploying to AWS. (Note: you need Docker installed for this to run correctly.)

Deploying

  • To deploy to AWS, you’ll need to set up an AWS account, if you haven’t already. You should also configure a cornerwise profile in your AWS credentials. See here for details about setting up a profile and the privileges the AWS user requires
  • Copy credentials.example.json to credentials.json and modify the variables to use your Socrata credentials.
  • If everything is correctly configured, you should be able to cd to this directory and type serverless deploy -v to fully deploy the lambda function and corresponding API Gateway interface to AWS.

Scrapers

Somerville, MA Reports and Decisions

URL
https://scraper.cornerwise.org/somervillema
Types
Cases
Source
somervillema.py
Description
Scrapes the OSPCD’s Reports and Decisions page.

Somerville, MA PB/ZBA Event Scraper

URL
https://scraper.cornerwise.org/somervillema_events
Types
Events
Source
somervillema_events.py
Description
Scrapes the city’s events page, finds events for the Planning Board and Zoning Board of Appeals, and scrapes the attached Agenda for related case numbers.

Cambridge, MA

URL
https://scraper.cornerwise.org/cambridgema
Types
Cases
Source
cambridgema.py

Somerville, MA Capital Projects

URL
https://scraper.cornerwise.org/somerville_projects
Source
somervillema_projects.py
Types
Projects
Description
Published annually by the Somerville Capital Projects Committee, the dataset includes “infrastructure projects, building improvements, park redesigns, and equipment purchases.”

Green Line Extension

URL
https://scraper.cornerwise.org/greenline
Types
Events
Description
Scrapes the “Upcoming Meetings” section of the Green Line Extension home page.

Using the Scrapers

The interface to the scrapers is intentionally simple. Place a GET request to the scraper’s URL. You may optionally supply a since query parameter formatted as yyyymmdd. The scraper will respond with JSON conforming to the Cornerwise scraper schema.

About

No description, website, or topics provided.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages