Implementation of the “standard” Cornerwise scrapers using AWS Lambda infrastructure.
- Install
Node.js
, preferably using NVM. npm install serverless
Installs Serverless, a command line utility that simplifies the deployment of services that run on AWS Lambda, Azure Functions, or others.npm install --save serverless-python-requirements
Installs a Serverless plugin that will download the PIP requirements specified inrequirements.txt
before deploying to AWS. (Note: you need Docker installed for this to run correctly.)
- To deploy to AWS, you’ll need to set up an AWS account, if you haven’t
already. You should also configure a
cornerwise
profile in your AWS credentials. See here for details about setting up a profile and the privileges the AWS user requires - Copy
credentials.example.json
tocredentials.json
and modify the variables to use your Socrata credentials. - If everything is correctly configured, you should be able to
cd
to this directory and typeserverless deploy -v
to fully deploy the lambda function and corresponding API Gateway interface to AWS.
- URL
- https://scraper.cornerwise.org/somervillema
- Types
- Cases
- Source
- somervillema.py
- Description
- Scrapes the OSPCD’s Reports and Decisions page.
- URL
- https://scraper.cornerwise.org/somervillema_events
- Types
- Events
- Source
- somervillema_events.py
- Description
- Scrapes the city’s events page, finds events for the Planning Board and Zoning Board of Appeals, and scrapes the attached Agenda for related case numbers.
- URL
- https://scraper.cornerwise.org/cambridgema
- Types
- Cases
- Source
- cambridgema.py
- URL
- https://scraper.cornerwise.org/somerville_projects
- Source
- somervillema_projects.py
- Types
- Projects
- Description
- Published annually by the Somerville Capital Projects Committee, the dataset includes “infrastructure projects, building improvements, park redesigns, and equipment purchases.”
- URL
- https://scraper.cornerwise.org/greenline
- Types
- Events
- Description
- Scrapes the “Upcoming Meetings” section of the Green Line Extension home page.
The interface to the scrapers is intentionally simple. Place a GET request to
the scraper’s URL. You may optionally supply a since
query parameter
formatted as yyyymmdd
. The scraper will respond with JSON conforming to the
Cornerwise scraper schema.