|
| 1 | +# File Metadata Extractor |
| 2 | + |
| 3 | +A comprehensive Python script that extracts metadata from various file types including images, audio files, videos, and documents. This tool is perfect for digital asset management, forensic analysis, and file organization. |
| 4 | + |
| 5 | +[](https://forthebadge.com) |
| 6 | + |
| 7 | +## Supported File Types |
| 8 | + |
| 9 | +### Images |
| 10 | +- **JPEG/JPG** - EXIF data, dimensions, camera settings |
| 11 | +- **PNG** - Dimensions, transparency info |
| 12 | +- **TIFF** - EXIF data, multi-page support |
| 13 | +- **BMP, GIF** - Basic image properties |
| 14 | + |
| 15 | +### Audio Files |
| 16 | +- **MP3** - ID3 tags, bitrate, duration, artist, album |
| 17 | +- **FLAC** - Lossless audio metadata |
| 18 | +- **WAV** - Audio properties, duration |
| 19 | +- **M4A, AAC, OGG, WMA** - Various audio metadata |
| 20 | + |
| 21 | +### Video Files |
| 22 | +- **MP4, AVI, MKV** - Duration, bitrate, basic properties |
| 23 | +- **MOV, WMV, FLV, WebM** - Video metadata extraction |
| 24 | + |
| 25 | +### Documents |
| 26 | +- **PDF** - Page count, author, creation date, encryption status |
| 27 | +- **DOCX** - Author, word count, creation/modification dates |
| 28 | + |
| 29 | +## Features |
| 30 | + |
| 31 | +- **Single File Analysis** - Extract metadata from individual files |
| 32 | +- **Batch Processing** - Process entire directories recursively |
| 33 | +- **Multiple Output Formats** - JSON and CSV export options |
| 34 | +- **Comprehensive Metadata** - File system info + format-specific data |
| 35 | +- **Error Handling** - Graceful handling of unsupported files |
| 36 | +- **Cross-platform** - Works on Windows, macOS, and Linux |
| 37 | + |
| 38 | +## Setup Instructions |
| 39 | + |
| 40 | +### Prerequisites |
| 41 | +- Python 3.6 or higher |
| 42 | +- pip (Python package manager) |
| 43 | + |
| 44 | +### Installation |
| 45 | + |
| 46 | +1. **Clone or download** the script to your local machine |
| 47 | + |
| 48 | +2. **Install required dependencies:** |
| 49 | + ```bash |
| 50 | + pip install -r requirements.txt |
| 51 | + ``` |
| 52 | + |
| 53 | + Or install manually: |
| 54 | + ```bash |
| 55 | + pip install Pillow mutagen PyPDF2 python-docx |
| 56 | + ``` |
| 57 | + |
| 58 | +3. **Make the script executable** (Linux/macOS): |
| 59 | + ```bash |
| 60 | + chmod +x file_metadata_extractor.py |
| 61 | + ``` |
| 62 | + |
| 63 | +## Usage |
| 64 | + |
| 65 | +### Basic Usage |
| 66 | + |
| 67 | +**Extract metadata from a single file:** |
| 68 | +```bash |
| 69 | +python file_metadata_extractor.py /path/to/your/file.jpg |
| 70 | +``` |
| 71 | + |
| 72 | +**Process all files in a directory:** |
| 73 | +```bash |
| 74 | +python file_metadata_extractor.py /path/to/directory/ |
| 75 | +``` |
| 76 | + |
| 77 | +**Process directory recursively (including subdirectories):** |
| 78 | +```bash |
| 79 | +python file_metadata_extractor.py /path/to/directory/ --recursive |
| 80 | +``` |
| 81 | + |
| 82 | +### Advanced Usage |
| 83 | + |
| 84 | +**Save results to JSON file:** |
| 85 | +```bash |
| 86 | +python file_metadata_extractor.py /path/to/files/ -o results.json -f json |
| 87 | +``` |
| 88 | + |
| 89 | +**Save results to CSV file:** |
| 90 | +```bash |
| 91 | +python file_metadata_extractor.py /path/to/files/ -o results.csv -f csv |
| 92 | +``` |
| 93 | + |
| 94 | +**Process directory recursively and save results:** |
| 95 | +```bash |
| 96 | +python file_metadata_extractor.py /path/to/files/ -r -o metadata_report.json |
| 97 | +``` |
| 98 | + |
| 99 | +### Command Line Options |
| 100 | + |
| 101 | +- `path` - File or directory path to analyze (required) |
| 102 | +- `-o, --output` - Output file path (optional) |
| 103 | +- `-f, --format` - Output format: json or csv (default: json) |
| 104 | +- `-r, --recursive` - Process directories recursively |
| 105 | +- `-h, --help` - Show help message |
| 106 | + |
| 107 | +## Output Examples |
| 108 | + |
| 109 | +### Image Metadata (JPEG) |
| 110 | +```json |
| 111 | +{ |
| 112 | + "filename": "photo.jpg", |
| 113 | + "file_size_mb": 2.34, |
| 114 | + "width": 1920, |
| 115 | + "height": 1080, |
| 116 | + "format": "JPEG", |
| 117 | + "exif": { |
| 118 | + "DateTime": "2023:10:15 14:30:22", |
| 119 | + "Camera": "Canon EOS 5D", |
| 120 | + "FNumber": "f/2.8", |
| 121 | + "ISO": "400" |
| 122 | + } |
| 123 | +} |
| 124 | +``` |
| 125 | + |
| 126 | +### Audio Metadata (MP3) |
| 127 | +```json |
| 128 | +{ |
| 129 | + "filename": "song.mp3", |
| 130 | + "file_size_mb": 4.56, |
| 131 | + "duration_formatted": "3:42", |
| 132 | + "bitrate": 320, |
| 133 | + "title": "Amazing Song", |
| 134 | + "artist": "Great Artist", |
| 135 | + "album": "Best Album", |
| 136 | + "year": "2023" |
| 137 | +} |
| 138 | +``` |
| 139 | + |
| 140 | +### PDF Metadata |
| 141 | +```json |
| 142 | +{ |
| 143 | + "filename": "document.pdf", |
| 144 | + "file_size_mb": 1.23, |
| 145 | + "page_count": 15, |
| 146 | + "title": "Important Document", |
| 147 | + "author": "John Doe", |
| 148 | + "creation_date": "2023-10-15T10:30:00" |
| 149 | +} |
| 150 | +``` |
| 151 | + |
| 152 | +## Detailed Explanation |
| 153 | + |
| 154 | +### Metadata Types Extracted |
| 155 | + |
| 156 | +**File System Information (All Files):** |
| 157 | +- File name and full path |
| 158 | +- File size (bytes and MB) |
| 159 | +- Creation, modification, and access timestamps |
| 160 | +- File extension |
| 161 | + |
| 162 | +**Image-Specific Metadata:** |
| 163 | +- Dimensions (width/height) |
| 164 | +- Color mode and format |
| 165 | +- EXIF data (camera settings, GPS, timestamps) |
| 166 | +- Transparency information |
| 167 | + |
| 168 | +**Audio-Specific Metadata:** |
| 169 | +- Duration and bitrate |
| 170 | +- Sample rate and channels |
| 171 | +- ID3 tags (title, artist, album, year, genre) |
| 172 | +- Track numbers and album artist |
| 173 | + |
| 174 | +**Document-Specific Metadata:** |
| 175 | +- Page/word counts |
| 176 | +- Author and title information |
| 177 | +- Creation and modification dates |
| 178 | +- Document properties and keywords |
| 179 | + |
| 180 | +### Error Handling |
| 181 | +The script gracefully handles: |
| 182 | +- Missing or corrupted files |
| 183 | +- Unsupported file formats |
| 184 | +- Missing dependencies (with helpful error messages) |
| 185 | +- Permission errors |
| 186 | +- Large file processing |
| 187 | + |
| 188 | +### Performance Notes |
| 189 | +- Large directories are processed file by file to conserve memory |
| 190 | +- EXIF data from images can be extensive |
| 191 | +- Video metadata extraction is limited to basic properties |
| 192 | +- PDF processing may be slower for large documents |
| 193 | + |
| 194 | +## Dependencies |
| 195 | + |
| 196 | +- **Pillow (PIL)** - Image metadata and EXIF extraction |
| 197 | +- **mutagen** - Audio and video metadata extraction |
| 198 | +- **PyPDF2** - PDF document metadata |
| 199 | +- **python-docx** - Microsoft Word document metadata |
| 200 | + |
| 201 | +All dependencies are optional - the script will skip unsupported formats if libraries are missing. |
| 202 | + |
| 203 | +## Author(s) |
| 204 | + |
| 205 | +Created for the Rotten-Scripts repository |
| 206 | + |
| 207 | +## Use Cases |
| 208 | + |
| 209 | +- **Digital Asset Management** - Organize photo/music libraries |
| 210 | +- **Forensic Analysis** - Extract file creation timestamps and metadata |
| 211 | +- **Content Audit** - Analyze document properties in bulk |
| 212 | +- **Data Migration** - Catalog files before/after transfers |
| 213 | +- **Media Organization** - Sort files by metadata properties |
| 214 | + |
| 215 | +## Limitations |
| 216 | + |
| 217 | +- Video metadata extraction is basic (duration, bitrate only) |
| 218 | +- Some proprietary formats may not be fully supported |
| 219 | +- Very large files may take time to process |
| 220 | +- DOCX support limited to basic properties |
| 221 | +- Requires appropriate permissions to read files |
| 222 | + |
| 223 | +## Future Enhancements |
| 224 | + |
| 225 | +- Support for more video codecs and detailed metadata |
| 226 | +- Excel file metadata extraction |
| 227 | +- Database output options |
| 228 | +- GUI interface |
| 229 | +- Batch file renaming based on metadata |
0 commit comments