calspy🗓️🕵️

calspy is a tool for investigative journalists and OSINT researchers to scrape events from a public Google Calendar. It works by mimicking human action in a web browser, automating the process of clicking through each month to find events. To avoid scraping until the dawn of time, calspy stops scraping when it has found 18 consecutive months with no events.

Features

Handles calendar navigation automatically
Saves calendar data in JSON format
Uses that data to generate HTML file for target
Allows you to quickly browse and search through all events

Requirements

Python 3.x
Chrome browser installed
Target's Public Google Calendar URL

Installation

Create and activate a virtual environment (recommended)

pip install -r requirements.txt

Usage

Basic usage (will stop after finding 18 consecutive empty months):

python calspy.py

Scrape exactly N months into the past:

# Scrape exactly 24 months
python calspy.py -months 24

# Scrape exactly 36 months with debug logging
python calspy.py -months 36 -debug

When prompted, paste the public Google Calendar URL.

Technical Details

The scraper works in three main phases:

Initial Setup
- Uses undetected-chromedriver to avoid detection
- Creates a headless browser session
- Extracts calendar ID from the provided URL
- Creates directory structure for data storage
Scraping Process
- Starts at current month
- Scrapes backward in time until no events are found
- Uses Selenium WebDriver for navigation
- Uses BeautifulSoup for HTML parsing
Data Processing
- Parses event details including:
  - Date and time
  - Event title
  - Description
  - Location
  - Attendees
- Saves structured data as JSON

Output Structure

Events are saved in:

calendars/
  [calendar_id]/
    [YYYYMMDD_HHMMSS]/
      calendar_data.json

JSON structure:

{
  "calendar_id": "[email protected]",
  "scrape_timestamp": "2024-03-14 12:34:56",
  "events": [
    {
      "datetime": "2024-03-14 10:00 AM",
      "summary": "Event Title",
      "description": "Event Description",
      "location": "Event Location",
      "attendees": []
    }
  ]
}

Command Line Arguments

-months: Number of months to scrape (overrides empty months check)
-debug: Enable debug logging (outputs to scraper.log)

Error Handling

Errors are logged to scraper_errors.log
The script handles common issues like:
- Navigation failures
- Network timeouts
- Parse errors

Limitations

Works only with public Google Calendars
Requires Chrome browser
May be affected by Google Calendar UI changes

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. DM me on Signal: pearswick.01

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
screenshots		screenshots
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
calspy.py		calspy.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

calspy🗓️🕵️

Features

Requirements

Installation

Usage

Technical Details

Output Structure

Command Line Arguments

Error Handling

Limitations

Contributing

License

About

Releases

Packages

Languages

License

pearswick/calspy

Folders and files

Latest commit

History

Repository files navigation

calspy🗓️🕵️

Features

Requirements

Installation

Usage

Technical Details

Output Structure

Command Line Arguments

Error Handling

Limitations

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages