Skip to content

Latest commit

 

History

History
65 lines (58 loc) · 3.56 KB

README.org

File metadata and controls

65 lines (58 loc) · 3.56 KB

Cornerwise Scrapers

Introduction

Implementation of the “standard” Cornerwise scrapers using AWS Lambda infrastructure.

Setup

  • Install Node.js, preferably using NVM.
  • npm install serverless Installs Serverless, a command line utility that simplifies the deployment of services that run on AWS Lambda, Azure Functions, or others.
  • npm install --save serverless-python-requirements Installs a Serverless plugin that will download the PIP requirements specified in requirements.txt before deploying to AWS. (Note: you need Docker installed for this to run correctly.)

Deploying

  • To deploy to AWS, you’ll need to set up an AWS account, if you haven’t already. You should also configure a cornerwise profile in your AWS credentials. See here for details about setting up a profile and the privileges the AWS user requires
  • Copy credentials.example.json to credentials.json and modify the variables to use your Socrata credentials.
  • If everything is correctly configured, you should be able to cd to this directory and type serverless deploy -v to fully deploy the lambda function and corresponding API Gateway interface to AWS.

Scrapers

Somerville, MA Reports and Decisions

URL
https://scraper.cornerwise.org/somervillema
Types
Cases
Source
somervillema.py
Description
Scrapes the OSPCD’s Reports and Decisions page.

Somerville, MA PB/ZBA Event Scraper

URL
https://scraper.cornerwise.org/somervillema_events
Types
Events
Source
somervillema_events.py
Description
Scrapes the city’s events page, finds events for the Planning Board and Zoning Board of Appeals, and scrapes the attached Agenda for related case numbers.

Cambridge, MA

URL
https://scraper.cornerwise.org/cambridgema
Types
Cases
Source
cambridgema.py

Somerville, MA Capital Projects

URL
https://scraper.cornerwise.org/somerville_projects
Source
somervillema_projects.py
Types
Projects
Description
Published annually by the Somerville Capital Projects Committee, the dataset includes “infrastructure projects, building improvements, park redesigns, and equipment purchases.”

Green Line Extension

URL
https://scraper.cornerwise.org/greenline
Types
Events
Description
Scrapes the “Upcoming Meetings” section of the Green Line Extension home page.

Using the Scrapers

The interface to the scrapers is intentionally simple. Place a GET request to the scraper’s URL. You may optionally supply a since query parameter formatted as yyyymmdd. The scraper will respond with JSON conforming to the Cornerwise scraper schema.