Skip to content

Conversation

namanvashistha
Copy link

@namanvashistha namanvashistha commented Oct 2, 2025

Add File_Metadata_Extractor: Comprehensive metadata extraction script

Description

This PR adds a new Python script that extracts comprehensive metadata from various file types including images, audio files, videos, and documents. The script provides both single-file analysis and batch processing capabilities with multiple export formats.

Key Features:

  • Extract metadata from images (EXIF), audio (ID3 tags), video, and documents
  • Support for JPEG, PNG, MP3, FLAC, PDF, DOCX and more formats
  • Command-line interface with JSON/CSV export options
  • Batch processing with recursive directory support
  • Comprehensive README with usage examples and setup instructions
  • Added to Python/README.md following repository structure

Supported File Types:

  • Images: JPEG (EXIF data), PNG, TIFF, BMP, GIF
  • Audio: MP3, FLAC, WAV, M4A, AAC, OGG, WMA
  • Video: MP4, AVI, MKV, MOV, WMV, FLV, WebM
  • Documents: PDF, DOCX

This script complements existing file management tools in the repository and provides valuable functionality for digital asset management, forensic analysis, and file organization.

Fixes #1449

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Files Added/Modified

  • New: Python/File_Metadata_Extractor/file_metadata_extractor.py - Main script with comprehensive metadata extraction
  • New: Python/File_Metadata_Extractor/README.md - Detailed documentation with usage examples
  • New: Python/File_Metadata_Extractor/requirements.txt - Required dependencies
  • Modified: Python/README.md - Added script to the main Python scripts list

Usage Examples

# Extract metadata from single file
python file_metadata_extractor.py photo.jpg

# Process directory and save to JSON
python file_metadata_extractor.py /photos -o metadata.json

# Process recursively and save to CSV  
python file_metadata_extractor.py /documents -r -o report.csv -f csv

Dependencies

Pillow (PIL) - Image metadata and EXIF extraction
mutagen - Audio and video metadata extraction
PyPDF2 - PDF document metadata
python-docx - Microsoft Word document metadata

Copy link

github-actions bot commented Oct 2, 2025

PR is not linked to any issue, please make the corresponding changes in the body. The issue should look like this. For help follow this link

- Extract metadata from images (EXIF), audio (ID3 tags), video, and documents
- Support for JPEG, PNG, MP3, FLAC, PDF, DOCX and more formats
- Command-line interface with JSON/CSV export options
- Batch processing with recursive directory support
- Comprehensive README with usage examples and setup instructions
- Added to Python/README.md following repository structure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request]: File Metadata Extractor
1 participant