-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial build of PrimeGov scraper #24
Comments
@krammy19 I'd like to volunteer to tackle this one. |
Sure, that would be a huge help! Thanks @skyheat47295 |
@krammy19 , @shengxio Hello, my first draft is ready for review. I can create a PR, or you can review the code here. The code works, and brings down a .csv file with the required data, however; |
@skyheat47295 , @krammy19 Hello and yes definitely. To be honest, I am still learning what you did in the repo so. Thanks! Roland Ding |
@krammy19 Hey Mark, I wonder if this task has been completed? Thanks! |
We need to design a web scraper for the PrimeGov agenda hosting platform. Examples of cities that use this software include San Mateo and Pleasanton
Input: a path endpoint or a batch .csv file of endpoints to scrape. This url should point to the specific page on a city website where agendas are listed out for public review.
Output: a .csv table with the following column data for each agenda listed on this page:
{Column} | {Description}
Index | Autoincrement index
City | City or agency name
Meeting Name | Title of government body
Date / Time | Date / time of meeting
Agenda | URL of agenda pdf
Meeting video | URL of meeting video (if available)
Published minutes | URL of minutes pdf (if available)
For reference and examples, please see this scraping walkthrough for Legistar.
As you build this scraper, keep in mind that we will need to eventually add additional features, including:
Accessing past agendas not included on the city’s main page, e.g. past years.
Filtering agenda scraping by Date range
Filtering agenda scraping by Meeting Name
Downloading all agendas from scraped urls into a specified directory
Scraping staff report urls from scraped agendas.
The text was updated successfully, but these errors were encountered: