Big Sister is a collection of script and tools that simplify the process of solving OSINT challenges during CTF competitions.
Open-Source Intelligence (OSINT) challenges are a central part of CTF competitions. They involve giving the participants some data (e.g. Image, Video, Audio etc.) and them finding out more information about the data using context clues and possible file metadata.
We can automate many of the processes that are common across the majority of OSINT challenges, meaning that we can reduce the amount of time we spend on such challenges, increasing productivity and efficiency.
For the first version of our "Big Sister" tool we will implement a metadata scraper and parser, alongside an Image Retrieval and Identification Script (I.R.I.S), which will use pre-existing reverse image search services to find close matches. In later stages we can implement Artificial Intelligence integration, that will provide LLM models such as ChatGPT o3 with the previously gathered data to result in even more comprehensive results.
Our main goal for this project is to create a program that is easy to use and modify. That means that we will use high-level programming languages and scripting languages, such as python, bash and lua. The target Operating System will be Debian-based Linux systems.
The metadata scraper part of the program will make a series of calls to third-party tools, such as exiftool, zsteg, steghide and binwalk. Another part of the program will then parse the output of those tools and store the lines that contain usable information. Lines containing information regarding the name of the user, the location of the file and other custom values are prime targets. Python based operating system calls can make calls to these tools with the file as an argument. Additionally, python is able to handle the parsing by using text matching.
IRIS can also be handled implemented using python. We can fork and use Google-Reverse-Image-Search unofficial API as the basis for this module.
This project will ideally reach a point where it is able to solve challenges without human intervention. We can perform testing by using OSINT challenges from past competitions, we have access to the ctf-archives repository to collect OSINT challenges from a wide variety of competitions.
-
[Alexia-Madalina Cirstea] (https://github.com/AlexiaMadalinaCirstea) (University emaiL: [email protected]) (Personal email: [email protected])
-
[Vlad-Luca Manolescu] (https://github.com/IlikeEndermen) (University emaiL: [email protected]) (Personal email: [email protected])
Big Sister is an OSINT (Open Source Intelligence) automation tool designed for CTF competitions. It combines metadata extraction, steganography analysis, and reverse image search capabilities into a unified GUI and CLI interface.
- Primary Target: Debian-based Linux systems (Ubuntu, Kali Linux, etc.)
- Secondary: Other Linux distributions with package manager support
- Minimum: Python 3.8+
- RAM: Minimum 4GB (8GB recommended for large file analysis)
- Storage: 2GB free space for tools and dependencies
- Network: Internet connection required for reverse image search
Install these packages using your system's package manager:
# Update package lists
sudo apt update
# Install core system tools
sudo apt install -y \
exiftool \
steghide \
binwalk \
python3 \
python3-pip \
python3-venv \
git \
curl \
wget \
imagemagick-6.q16 \
ruby \
ruby-dev \
build-essential \
chromium-browser \
chromium-chromedriver# Install zsteg (Ruby gem for steganography analysis)
sudo gem install zstegCreate a virtual environment and install Python packages:
# Create virtual environment
python3 -m venv bigsister-env
source bigsister-env/bin/activate
# Install Python packages
pip install --upgrade pip
pip install \
pillow \
selenium \
webdriver-manager \
tkinter-tooltip \
requests \
beautifulsoup4 \
lxml- Purpose: Extract comprehensive metadata from images and files
- Installation:
sudo apt install exiftool - Features:
- EXIF data extraction
- JSON output support
- Supports 100+ file formats
- Custom tag support
- Purpose: Fallback image processing library
- Installation:
pip install pillow - Features:
- Basic EXIF extraction when ExifTool unavailable
- Image format detection
- Image resizing and processing
- Purpose: Detect and extract hidden data in images
- Installation:
sudo apt install steghide - Supported Formats: JPEG, BMP3, WAV, AU
- Features:
- Password-protected extraction
- Capacity analysis
- Info mode for metadata inspection
- Script:
runsteghide.sh
- Purpose: Ruby-based steganography detection
- Installation:
sudo gem install zsteg - Features:
- LSB steganography detection
- Multiple analysis modes
- Comprehensive scanning options
- Script:
runzsteg.sh
- Purpose: Firmware and file signature analysis
- Installation:
sudo apt install binwalk - Features:
- Embedded file detection
- Automatic extraction
- Entropy analysis
- Custom signature support
- Script:
runbinwalk.sh
- Purpose: Automated reverse image searching
- Installation:
pip install selenium webdriver-manager - Features:
- Google Images reverse search
- Automated browser interaction
- Result extraction and parsing
- Requirements: Chrome/Chromium browser
- Installation:
sudo apt install chromium-browser chromium-chromedriver - Purpose: Browser automation for image search
- Alternative: Google Chrome can be used instead
- Location:
src/metadata/ - Components:
exiftool_scraper.py- EXIF data extractionparser.py- Unified metadata parsing
- Location:
src/steganography/ - Components:
steghide_scraper.py- Steghide automationbinwalk_scraper.py- Binwalk automationzsteg_scraper.py- Zsteg automation- Shell scripts for each tool
- Location:
src/iris/ - Components:
image_search.py- Reverse image search automation
- Location:
src/utils/ - Components:
gui.py- Tkinter-based GUI with dark/light modeterminal.py- Command-line interfacefile_handler.py- File operation utilities
- Location:
src/main.py - Features:
- Unified metadata processing chain
- CLI argument parsing
- GUI/Terminal mode selection
Edit config.json to specify custom tool paths:
{
"tool_paths": {
"exiftool": "/usr/bin/exiftool",
"zsteg": "/usr/local/bin/zsteg",
"steghide": "/usr/bin/steghide",
"binwalk": "/usr/bin/binwalk"
}
}- JPEG (.jpg, .jpeg)
- PNG (.png)
- BMP (.bmp) - Note: BMP V5 requires conversion to BMP3 for Steghide
- GIF (.gif)
- TIFF (.tiff, .tif)
- WebP (.webp)
- WAV (.wav)
- AU (.au)
- Firmware images
- Archive files
- Any binary file with embedded data
If you encounter BMP format errors with Steghide:
# Convert BMP V5 to BMP3
convert input.bmp bmp3:output_v3.bmp# Update webdriver
pip install --upgrade webdriver-manager
# Alternative Chrome installation
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
sudo apt update
sudo apt install google-chrome-stable# Make shell scripts executable
chmod +x src/steganography/*.sh
chmod +x scripts/*.shNote: This tool is designed for educational and legitimate OSINT research. Users are responsible for compliance with applicable laws and terms of service when using reverse image search features.
This README documents the Big Sister project—a modular toolkit for analyzing and extracting metadata and embedded content from images and other media files.
This documentation summarizes work completed on the Big Sister project, including:
- Steghide modules (shell script and Python scraper)
- Binwalk modules (shell script and Python scraper)
- ExifTool scraper updates
- Unified parser updates
- Main orchestration logic
- Steghide cover format compatibility and usage
scripts/runsteghide.sh
A Bash wrapper for the steghide CLI that supports:
--infomode to display embedded data metadata--passphrase(-p) to supply a passphrase when required--output-dir(-o) to specify where extracted files land
runsteghide.sh [--info] [-p PASS] [-o OUTPUT_DIR] <file>src/steganography/steghide_scraper.py
A Python class SteghideScraper that:
- Invokes
steghide info -vorsteghide extractviasubprocess - Accepts an optional
passphraseargument - Parses
Key: Valuelines using a regex helper - Returns a
dictof metadata fields or raw output - Provides
display_metadata()to print results in human‑readable form
scripts/runbinwalk.sh
A Bash wrapper for the binwalk CLI that supports:
- Signature scanning (default mode)
- Extraction (
-e/--extract) to pull out embedded files - Custom extraction directory (
-d/--output-dir)
runbinwalk.sh [-e] [-d OUTPUT_DIR] <file>src/steganography/binwalk_scraper.py
A Python class BinwalkScraper that:
- Invokes
binwalkorbinwalk -e -Cviasubprocess - Returns a
dictwith aSignatureslist of{Offset, Description} - Includes raw output in the returned dict
- Provides
display_metadata()to print signature tables and extraction info
src/metadata/exiftool_scraper.py
A Python class MetadataScraper that:
- Attempts to call
exiftool -j -nviasubprocessto get full JSON metadata - Falls back to
PIL.Image._getexif()for basic tags if ExifTool is unavailable - Merges parsed JSON or Pillow EXIF into a single
dict - Provides
display_metadata()to format and print tag names and values - Detects timestamp inconsistencies between EXIF and filesystem metadata as potential anomalies
src/metadata/parser.py
A Python class MetadataParser that:
- Defines a helper
_parse_key_value(line)using a single regex forKey: Valueextraction - Implements
parse_exif(),parse_zsteg(),parse_steghide(), andparse_binwalk()methods - Each accepts either raw multiline text or a prebuilt
dict - Returns a normalized
dictschema for each module - Supports parsing Binwalk signatures into a
{'Signatures': [...]}structure
src/main.py
-
Introduced
run_metadata_chain(file_path)to:- Run EXIF scraper + parse output
- Run Steghide scraper + parse output
- Run Binwalk scraper + parse output
- Print a combined summary of all parsed metadata
-
Updated
terminal_mode()to useargparse:filepositional argument--extract-binwalkflag to enable extraction step--search-imageflag to trigger the reverse-image search stub
-
Retained a menu in
main()to choose between CLI and GUI (startGUI()placeholder)
Steghide can embed data into:
- JPEG (
.jpg,.jpeg) - BMP3 (Windows 3.x BITMAPINFOHEADER–based
.bmp) - WAV and AU audio files
Note: Newer “BMP V5” files (with a 124-byte header) are not supported:
the bmp file "<file>.bmp" has a format that is not supported (biSize: 124).
If your .bmp has a 124-byte header, convert it to the classic 40-byte BITMAPINFOHEADER format using ImageMagick:
# Install ImageMagick if needed
sudo apt update
sudo apt install imagemagick-6.q16
# Convert V5 BMP → classic BMP3
convert /path/to/input.bmp bmp3:/path/to/output_v3.bmpNote: The bmp3: prefix forces a V3 header. output_v3.bmp is now compatible with Steghide.
steghide embed -cf <coverfile> -ef <payload> -p <passphrase> [-z <level>] # compression level 1–9 (default: 1)
[-e none] # disable encryption
[-K] # omit CRC32 checksum
[-f] # overwrite existing filesExample:
echo "SECRET" > secret.txt
steghide embed -cf hoothoot_v3.bmp -ef secret.txt -p testpass -z 9 -fsteghide info -p <passphrase> <stegofile>Example:
steghide info -p testpass hoothoot_v3.bmpsteghide extract -sf <stegofile> [-p <passphrase>] [-xf <outfile>] [-f] # overwrite existing output fileExamples:
# Default extraction
steghide extract -sf hoothoot_v3.bmp -p testpass
# Custom filename
steghide extract -sf hoothoot_v3.bmp -p testpass -xf recovered.txt
# Force overwrite
steghide extract -sf hoothoot_v3.bmp -p testpass -f# 1) (If needed) Convert PNG → BMP3
convert input.png bmp3:input_v3.bmp
# 2) Embed
echo "HELLO_CTF" > payload.txt
steghide embed -cf input_v3.bmp -ef payload.txt -p mypass -f
# 3) Inspect
steghide info -p mypass input_v3.bmp
# 4) Extract
steghide extract -sf input_v3.bmp -p mypass -fTip: Always verify your cover file’s format before embedding. Use
file input.bmpor open with a hex viewer to check the DIB header size. Steghide requires the classic 40-byte header for BMPs.
All core scrapers (EXIF via ExifTool/Pillow, Zsteg, Binwalk) and their Python wrappers are fully implemented and parsed by the unified regex-based parser. The main entrypoint chains EXIF → Steghide → Binwalk with CLI/GUI options.
Note: The Steghide wrapper needs minor syntax fixes under WSL to align its flag handling.
Make sure you have Docker installed on your system.
- For Windows or macOS, get Docker Desktop.
- For Linux, follow the official Docker Engine installation guide.
Note on
sudo: The commands below assume you have completed the official Docker post-installation steps for Linux, which allow you to run Docker withoutsudo. If you get apermission deniederror, you can either prefix the commands withsudoor follow that guide to add your user to thedockergroup.
Before you can run the application, you need to build the Docker image. This command downloads all the dependencies and sets up the environment defined in the Dockerfile.
From the project's root directory (~/BigSister), run:
docker compose buildTo run the tool, you need to use this command:
docker compose run --rm bigsisterThere are 2 different versions of the app (terminal and GUI), read instructions below.
When starting the application, you will be given two choices:
=== Big Sister - Metadata and Image Analysis Tool ===
Choose your interface:
1. GUI (Graphical User Interface)
2. Terminal (Command Line Interface)
Enter 1 or 2:If you want to use terminal option (2), you need to specify the file, like this:
docker compose run --rm bigsister /downloads/example_image.jpeg-
Linux
Before running the app, make sure to run:
xhost +local:
With this commad you are adding non-network local connections to your access control list. Without this command you will see
Authorization required, but no authorization protocol specified.Then, you can run as usual:
docker compose run --rm bigsister
-
Windows
Use x11docker
-
MacOS
Use distrobox
By default Docker containers do not get access to the file system of the host.
In order to pass files to be analyzed, you need to map Docker volumes in the docker-compose.yml. We map current directory and ~/Downloads by default:
volumes:
# Map X11 file for GUI
- /tmp/.X11-unix:/tmp/.X11-unix
# Make current host directory available within container as /app
- .:/app
# Example for making other directories available
- ~/Downloads:/DownloadsWith this setup you can reference files, like this:
# A file in current directory
$ docker compose run --rm bigsister image.png
# Another file in host's ~/Downloads
$ docker compose run --rm bigsister /Downloads/another.pngYou can add more directories by modifying docker-compose.yml