SubMailScout is a high-performance, asynchronous web reconnaissance tool designed for comprehensive domain analysis and email discovery. It combines multiple scanning techniques to efficiently map websites, discover subdomains, and extract contact information from various document types.
- Asynchronous Operation: Utilizes Python's
asyncio
for high-performance concurrent scanning - Smart Rate Limiting: Prevents server overload with built-in rate limiting
- Comprehensive Scanning:
- Recursive webpage crawling
- Document parsing (PDF, DOC, DOCX, XLS, XLSX)
- Dynamic page detection
- Directory enumeration
- Subdomain discovery via DNS and certificate transparency logs
- Email Extraction:
- Advanced pattern matching for email addresses
- Validation and filtering of discovered emails
- Support for various file formats
- File Processing:
- Automatic file type detection
- In-memory file processing
- Temporary file cleanup
- Robust Error Handling:
- Comprehensive logging
- Connection error recovery
- Invalid URL handling
- Python 3.7+
- pip (Python package installer)
- Clone the repository:
git clone https://github.com/nublex/submailscout.git
cd submailscout
- Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Run the scanner:
python submailscout.py
When prompted, enter your target domain (e.g., example.com).
The tool will:
- Start scanning the domain recursively
- Check common directories
- Enumerate subdomains
- Process any discovered documents
- Extract and validate email addresses
- Save results to
scan_results.json
Results are saved in JSON format containing:
- Discovered email addresses
- Found directories
- Enumerated subdomains
- Scan statistics (duration, URLs scanned, files processed)
Example output structure:
{
"emails": ["[email protected]", "[email protected]"],
"directories": ["http://example.com/docs", "http://example.com/assets"],
"subdomains": ["mail.example.com", "www.example.com"],
"scan_time": "45.23 seconds",
"total_urls_scanned": 150,
"total_files_processed": 25
}
The tool maintains detailed logs in scanner.log
, including:
- URLs visited
- Files processed
- Errors encountered
- Scan progress
This tool is provided for educational and ethical testing purposes only. Users are responsible for:
- Obtaining permission before scanning any domains
- Complying with all applicable laws and regulations
- Adhering to website terms of service
- Respecting robots.txt directives
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- Built with Python's asyncio for high-performance async operations
- Uses multiple open-source libraries for comprehensive file parsing
- Inspired by the need for efficient and thorough web reconnaissance