A scraper application for crawling US Congress, industry associations, and think tanks press releases, hearings, markups, and bills for analytical purposes.
Time Range: Past content within one week (for most sources) and all future content.
Export Format: CSV, US Government, Think Tanks
Note: For easier navigation, think tank press content are located on a seperate page from the US Government releases.
- Wilson: Date, URL, and title of insight and analysis for the Wilson Center's Insights & Analysis page;
https://www.wilsoncenter.org/insight-analysis?_page=1&keywords=&_limit=10&programs=109 - Brookings: Date, URL, and title of insight and analysis for all content produced by the Brookings Institution page;
https://www.brookings.edu/search/?s=&post_type%5B%5D=&topic%5B%5D=&pcp=&date_range=&start_date=&end_date= - CSIS: Date, type, title, URL, and description of insight and analysis for all content by the Center For Strategic & International Studies;
https://www.csis.org/analysis - Asia Society: Title, URL, and description of insight and analysis for all publications by the Asia Society Policy Institute;
https://www.asiasociety.org/policy-institute/publications - ICAS: Date, type, title, URL, and description of insight and analysis for all content by the Institute for China-America Studies;
https://www.chinaus-icas.org/research-main/ - Atlantic Council: Date, category, title, URL, description, and tags of insight and analysis for all content by the Atlantic Council;
https://www.atlanticcouncil.org/insights-impact/research/, https://www.atlanticcouncil.org/insights-impact/commentary/
- Daily Digests: Date, URL, and text providing details of legislation introduced, reported, passed, and considered by the full House or Senate each legislative day;
https://www.congress.gov/bills-with-chamber-action/browse-by-date - Daily Bill Texts: Date, PDF file, and text providing detailed information on legislation considered in Daily Digests;
https://www.congress.gov/bill-texts-received-today - All Bills: Date, URL, and other details (eg. title, sponsor, committees, latest action) for all bills under total of "All Bills, Resolutions, and Amendments";
https://www.congress.gov/bills-with-chamber-action/browse-by-date
- Roll Call Votes: Date, name, and vote results of ALL Senate legislation passing through the 117th Congress;
https://www.senate.gov/legislative/LIS/roll_call_lists/vote_menu_117_1.htm - Floor Activity: Date, URL, and text providing details of senate floor proceedings;
https://floor.senate.gov/proceedings
- Commerce: Date, URL. title, and summary of press releases, hearings, and markups from the US Senate Committee on Commerce, Science, and Transportation;
https://www.commerce.senate.gov/pressreleases, https://www.commerce.senate.gov/hearings, https://www.commerce.senate.gov/markups - Foreign: Type of content (nomiations, treaties, legislation, hearing transcripts, business meeting transcripts, committee reports, other), date, URL (if given), and text for activities and reports from the US Senate Committee on Foreign Relations;
https://www.foreign.senate.gov/activities-and-reports - Banking: Date, URL, and title for press releases, hearings, and markups from the US Senate Committee on Banking, Housing, and Urban Affairs;
https://www.banking.senate.gov/newsroom/majority-press-releases, https://www.banking.senate.gov/hearings, https://www.banking.senate.gov/markups - Finance: Source of content (majority, minority), date, URL, and title for press releases and hearings from the US Senate Committee on Finance;
https://www.finance.senate.gov/chairmans-news, https://www.finance.senate.gov/hearings - HLSGA: Source of content (majority, minority), date, URL, and title for press releases and hearings from the US Senate Committee on Homeland Security & Government Affairs;
https://www.hsgac.senate.gov/media/majority-media, https://www.hsgac.senate.gov/hearings - Judiciary: Source of content (majority, minority), date, URL, and title for press releases and hearings from the US Senate Committee on the Judiciary;
https://www.judiciary.senate.gov/press/majority, https://www.judiciary.senate.gov/hearings - Intelligence: Date, URL, title, and summary for news from US Senate Select Committee on Intelligence;
https://www.intelligence.senate.gov/press, https://www.intelligence.senate.gov/hearings
- Energy: Date, URL, title, and summary of press releases, hearings, and markups from the US House Committee on Energy;
https://energycommerce.house.gov/newsroom/press-releases, https://energycommerce.house.gov/committee-activity/hearings, https://energycommerce.house.gov/committee-activity/markups - Financial Services: Date, URL, title, and summary of press releases, hearings, and markups from the US House Committee on Financial Services;
https://financialservices.house.gov/news/, https://financialservices.house.gov/calendar/?EventTypeID=577&Congress=117, https://financialservices.house.gov/calendar/?EventTypeID=575&Congress=117 - Foreign: Date, time (if applicable), title, and URL for press releases, hearings, and markups from the US House Committee on Foreign Affairs;
https://foreignaffairs.house.gov/press-releases, https://foreignaffairs.house.gov/hearings, https://foreignaffairs.house.gov/markups - Homeland: Date, title, and url for news, hearings, and markups from the US House Committee on Homeland Security;
https://homeland.house.gov/activities/hearings. https://homeland.house.gov/activities/markups, https://homeland.house.gov/news - Science, Space, and Tech: Date, URL, and title of press releases, hearings, and markups from the US House Committee on Science, Space, and Tech;
https://science.house.gov/news/press-releases, https://science.house.gov/hearings, https://science.house.gov/markups - Transportation: Date, URL, and title of press releases, hearings, and markups from the US House Committee on Transportation (Both Majority and Minority sites);
https://republicans-transportation.house.gov/news/documentquery.aspx?DocumentTypeID=2545, https://republicans-transportation.house.gov/calendar/?EventTypeID=542, https://republicans-transportation.house.gov/calendar/?EventTypeID=541, https://transportation.house.gov/news/press-releases, https://transportation.house.gov/committee-activity/hearings, https://transportation.house.gov/committee-activity/markups - Intelligence: Date, URL, title, and summary for news from US Permanent Select Committee on Intelligence;
https://intelligence.house.gov/
- Energy: Date, URL, title, and summary of press releases, hearings, and markups from the US Republican Committee on Energy and Commerce;
https://republicans-energycommerce.house.gov/news/, https://republicans-energycommerce.house.gov/hearings/, https://republicans-energycommerce.house.gov/markups/ - Foreign: Date, URL, title, and summary of updates, hearings, and markups from the US Republican Committee on Foreign Affairs;
https://gop-foreignaffairs.house.gov/updates/, https://gop-foreignaffairs.house.gov/hearing/, https://gop-foreignaffairs.house.gov/markup/ - Homeland: Date, title, URL, and description for press releases from the US House Committee on Homeland Security;
https://republicans-homeland.house.gov/committee-activity/press-releases/ - Science: Date, title, and url for news, hearings, and markups from the US House Committee on Science, Space, and Technology;
https://republicans-science.house.gov/news. https://republicans-science.house.gov/legislation/hearings, https://republicans-science.house.gov/legislation/markups
- SIA:Date, URL, and title of all headlines for the Semiconductor Industry Association;
https://www.semiconductors.org/news-events/latest-news/ - FCC: Date, URL, and title of all headlines for the Federal Communications Commission;
https://www.fcc.gov/news-events/headlines
- Clone repository.
- Run
./script.bash
in the terminal. - Using Crontab(Mac/Linux) or Task Scheduler(Windows), set up execution schedule to automatically run scraping job.