Skip to content

Commit 368ed95

Browse files
Add File_Metadata_Extractor: Comprehensive metadata extraction script
- Extract metadata from images (EXIF), audio (ID3 tags), video, and documents - Support for JPEG, PNG, MP3, FLAC, PDF, DOCX and more formats - Command-line interface with JSON/CSV export options - Batch processing with recursive directory support - Comprehensive README with usage examples and setup instructions - Added to Python/README.md following repository structure
1 parent 31fd3fb commit 368ed95

File tree

4 files changed

+711
-0
lines changed

4 files changed

+711
-0
lines changed
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# File Metadata Extractor
2+
3+
A comprehensive Python script that extracts metadata from various file types including images, audio files, videos, and documents. This tool is perfect for digital asset management, forensic analysis, and file organization.
4+
5+
[![forthebadge](https://forthebadge.com/images/badges/made-with-python.svg)](https://forthebadge.com)
6+
7+
## Supported File Types
8+
9+
### Images
10+
- **JPEG/JPG** - EXIF data, dimensions, camera settings
11+
- **PNG** - Dimensions, transparency info
12+
- **TIFF** - EXIF data, multi-page support
13+
- **BMP, GIF** - Basic image properties
14+
15+
### Audio Files
16+
- **MP3** - ID3 tags, bitrate, duration, artist, album
17+
- **FLAC** - Lossless audio metadata
18+
- **WAV** - Audio properties, duration
19+
- **M4A, AAC, OGG, WMA** - Various audio metadata
20+
21+
### Video Files
22+
- **MP4, AVI, MKV** - Duration, bitrate, basic properties
23+
- **MOV, WMV, FLV, WebM** - Video metadata extraction
24+
25+
### Documents
26+
- **PDF** - Page count, author, creation date, encryption status
27+
- **DOCX** - Author, word count, creation/modification dates
28+
29+
## Features
30+
31+
- **Single File Analysis** - Extract metadata from individual files
32+
- **Batch Processing** - Process entire directories recursively
33+
- **Multiple Output Formats** - JSON and CSV export options
34+
- **Comprehensive Metadata** - File system info + format-specific data
35+
- **Error Handling** - Graceful handling of unsupported files
36+
- **Cross-platform** - Works on Windows, macOS, and Linux
37+
38+
## Setup Instructions
39+
40+
### Prerequisites
41+
- Python 3.6 or higher
42+
- pip (Python package manager)
43+
44+
### Installation
45+
46+
1. **Clone or download** the script to your local machine
47+
48+
2. **Install required dependencies:**
49+
```bash
50+
pip install -r requirements.txt
51+
```
52+
53+
Or install manually:
54+
```bash
55+
pip install Pillow mutagen PyPDF2 python-docx
56+
```
57+
58+
3. **Make the script executable** (Linux/macOS):
59+
```bash
60+
chmod +x file_metadata_extractor.py
61+
```
62+
63+
## Usage
64+
65+
### Basic Usage
66+
67+
**Extract metadata from a single file:**
68+
```bash
69+
python file_metadata_extractor.py /path/to/your/file.jpg
70+
```
71+
72+
**Process all files in a directory:**
73+
```bash
74+
python file_metadata_extractor.py /path/to/directory/
75+
```
76+
77+
**Process directory recursively (including subdirectories):**
78+
```bash
79+
python file_metadata_extractor.py /path/to/directory/ --recursive
80+
```
81+
82+
### Advanced Usage
83+
84+
**Save results to JSON file:**
85+
```bash
86+
python file_metadata_extractor.py /path/to/files/ -o results.json -f json
87+
```
88+
89+
**Save results to CSV file:**
90+
```bash
91+
python file_metadata_extractor.py /path/to/files/ -o results.csv -f csv
92+
```
93+
94+
**Process directory recursively and save results:**
95+
```bash
96+
python file_metadata_extractor.py /path/to/files/ -r -o metadata_report.json
97+
```
98+
99+
### Command Line Options
100+
101+
- `path` - File or directory path to analyze (required)
102+
- `-o, --output` - Output file path (optional)
103+
- `-f, --format` - Output format: json or csv (default: json)
104+
- `-r, --recursive` - Process directories recursively
105+
- `-h, --help` - Show help message
106+
107+
## Output Examples
108+
109+
### Image Metadata (JPEG)
110+
```json
111+
{
112+
"filename": "photo.jpg",
113+
"file_size_mb": 2.34,
114+
"width": 1920,
115+
"height": 1080,
116+
"format": "JPEG",
117+
"exif": {
118+
"DateTime": "2023:10:15 14:30:22",
119+
"Camera": "Canon EOS 5D",
120+
"FNumber": "f/2.8",
121+
"ISO": "400"
122+
}
123+
}
124+
```
125+
126+
### Audio Metadata (MP3)
127+
```json
128+
{
129+
"filename": "song.mp3",
130+
"file_size_mb": 4.56,
131+
"duration_formatted": "3:42",
132+
"bitrate": 320,
133+
"title": "Amazing Song",
134+
"artist": "Great Artist",
135+
"album": "Best Album",
136+
"year": "2023"
137+
}
138+
```
139+
140+
### PDF Metadata
141+
```json
142+
{
143+
"filename": "document.pdf",
144+
"file_size_mb": 1.23,
145+
"page_count": 15,
146+
"title": "Important Document",
147+
"author": "John Doe",
148+
"creation_date": "2023-10-15T10:30:00"
149+
}
150+
```
151+
152+
## Detailed Explanation
153+
154+
### Metadata Types Extracted
155+
156+
**File System Information (All Files):**
157+
- File name and full path
158+
- File size (bytes and MB)
159+
- Creation, modification, and access timestamps
160+
- File extension
161+
162+
**Image-Specific Metadata:**
163+
- Dimensions (width/height)
164+
- Color mode and format
165+
- EXIF data (camera settings, GPS, timestamps)
166+
- Transparency information
167+
168+
**Audio-Specific Metadata:**
169+
- Duration and bitrate
170+
- Sample rate and channels
171+
- ID3 tags (title, artist, album, year, genre)
172+
- Track numbers and album artist
173+
174+
**Document-Specific Metadata:**
175+
- Page/word counts
176+
- Author and title information
177+
- Creation and modification dates
178+
- Document properties and keywords
179+
180+
### Error Handling
181+
The script gracefully handles:
182+
- Missing or corrupted files
183+
- Unsupported file formats
184+
- Missing dependencies (with helpful error messages)
185+
- Permission errors
186+
- Large file processing
187+
188+
### Performance Notes
189+
- Large directories are processed file by file to conserve memory
190+
- EXIF data from images can be extensive
191+
- Video metadata extraction is limited to basic properties
192+
- PDF processing may be slower for large documents
193+
194+
## Dependencies
195+
196+
- **Pillow (PIL)** - Image metadata and EXIF extraction
197+
- **mutagen** - Audio and video metadata extraction
198+
- **PyPDF2** - PDF document metadata
199+
- **python-docx** - Microsoft Word document metadata
200+
201+
All dependencies are optional - the script will skip unsupported formats if libraries are missing.
202+
203+
## Author(s)
204+
205+
Created for the Rotten-Scripts repository
206+
207+
## Use Cases
208+
209+
- **Digital Asset Management** - Organize photo/music libraries
210+
- **Forensic Analysis** - Extract file creation timestamps and metadata
211+
- **Content Audit** - Analyze document properties in bulk
212+
- **Data Migration** - Catalog files before/after transfers
213+
- **Media Organization** - Sort files by metadata properties
214+
215+
## Limitations
216+
217+
- Video metadata extraction is basic (duration, bitrate only)
218+
- Some proprietary formats may not be fully supported
219+
- Very large files may take time to process
220+
- DOCX support limited to basic properties
221+
- Requires appropriate permissions to read files
222+
223+
## Future Enhancements
224+
225+
- Support for more video codecs and detailed metadata
226+
- Excel file metadata extraction
227+
- Database output options
228+
- GUI interface
229+
- Batch file renaming based on metadata

0 commit comments

Comments
 (0)