calspy is a tool for investigative journalists and OSINT researchers to scrape events from a public Google Calendar. It works by mimicking human action in a web browser, automating the process of clicking through each month to find events. To avoid scraping until the dawn of time, calspy stops scraping when it has found 18 consecutive months with no events.
- Handles calendar navigation automatically
- Saves calendar data in JSON format
- Uses that data to generate HTML file for target
- Allows you to quickly browse and search through all events
- Python 3.x
- Chrome browser installed
- Target's Public Google Calendar URL
- Create and activate a virtual environment (recommended)
pip install -r requirements.txt
Basic usage (will stop after finding 18 consecutive empty months):
python calspy.py
Scrape exactly N months into the past:
# Scrape exactly 24 months
python calspy.py -months 24
# Scrape exactly 36 months with debug logging
python calspy.py -months 36 -debug
When prompted, paste the public Google Calendar URL.
The scraper works in three main phases:
-
Initial Setup
- Uses undetected-chromedriver to avoid detection
- Creates a headless browser session
- Extracts calendar ID from the provided URL
- Creates directory structure for data storage
-
Scraping Process
- Starts at current month
- Scrapes backward in time until no events are found
- Uses Selenium WebDriver for navigation
- Uses BeautifulSoup for HTML parsing
-
Data Processing
- Parses event details including:
- Date and time
- Event title
- Description
- Location
- Attendees
- Saves structured data as JSON
- Parses event details including:
Events are saved in:
calendars/
[calendar_id]/
[YYYYMMDD_HHMMSS]/
calendar_data.json
JSON structure:
{
"calendar_id": "[email protected]",
"scrape_timestamp": "2024-03-14 12:34:56",
"events": [
{
"datetime": "2024-03-14 10:00 AM",
"summary": "Event Title",
"description": "Event Description",
"location": "Event Location",
"attendees": []
}
]
}
-months
: Number of months to scrape (overrides empty months check)-debug
: Enable debug logging (outputs to scraper.log)
- Errors are logged to scraper_errors.log
- The script handles common issues like:
- Navigation failures
- Network timeouts
- Parse errors
- Works only with public Google Calendars
- Requires Chrome browser
- May be affected by Google Calendar UI changes
Contributions are welcome! Please feel free to submit a Pull Request. DM me on Signal: pearswick.01
This project is licensed under the MIT License - see the LICENSE file for details.