Skip to content

rzane/file_type

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FileType

github.com coveralls.io hex.pm hex.pm hex.pm github.com

This package can be used to detect the MIME type and canonical extension by looking for magic numbers. It works by reading a small amount of data from the file (~256 bytes) and binary pattern matching against it's contents.

API Documentation

Usage

Detecting a file's type:

iex> FileType.from_path("profile.png")
{:ok, {"png", "image/png"}}

iex> FileType.from_path("contract.docx")
{:ok, {"docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document"}}

Detect a file's type from an IO:

iex> {:ok, file} = File.open("profile.png", [:read, :binary])
{:ok, file}

iex> FileType.from_io(file)
{:ok, {"png", "image/png"}}

Installation

The package can be installed by adding file_type to your list of dependencies in mix.exs:

def deps do
  [
    {:file_type, "~> 0.1.0"}
  ]
end

Supported types

Document

  • docx - Microsoft Word Open XML Document
  • pptx - PowerPoint Open XML Presentation
  • xlsx - Microsoft Excel Open XML Spreadsheet
  • doc - Microsoft Word Document
  • ppt - PowerPoint Presentation
  • xls - Excel Spreadsheet
  • pdf - Portable Document Format File
  • epub - Open eBook File
  • mobi - Mobipocket eBook
  • odt - OpenDocument Text Document
  • ods - OpenDocument Spreadsheet
  • odp - OpenDocument Presentation
  • rtf - Rich Text Format File

Image

  • jpg - JPEG Image
  • png - Portable Network Graphic
  • apng - Animated Portable Network Graphic
  • gif - Graphical Interchange Format File
  • webp - WebP Image
  • flif - Free Lossless Image Format File
  • cr2 - Canon Raw Image File
  • cr3 - Canon Raw 3 Image File
  • orf - Olympus RAW File
  • arw - Sony Digital Camera Image
  • dng - Digital Negative Image File
  • nef - Nikon Electronic Format RAW Image
  • rw2 - Panasonic RAW Image
  • raf - Fuji RAW Image File
  • tif - Tagged Image File
  • bmp - Bitmap Image File
  • icns - macOS Icon Resource File
  • jxr - JPEG XR Image
  • psd - Adobe Photoshop Document
  • dmg - Apple Disk Image
  • ico - Icon File
  • bpg - BPG Image
  • jp2 - JPEG 2000 Core Image File
  • jpm - JPEG 2000 Compound Image File Format
  • jpx - JPEG 2000 Image File
  • heic - High Efficiency Image Format
  • cur - Windows Cursor
  • ktx - Khronos Texture
  • avif - AV1 Image
  • dcm - DICOM Image

Video

  • mp4 - MPEG-4 Video File
  • mkv - Matroska Video File
  • webm - WebM Video File
  • mov - Apple QuickTime Movie
  • avi - Audio Video Interleave File
  • mpg - MPEG Video File
  • ogv - Ogg Video File
  • ogm - Ogg Media File
  • flv - Flash Video File
  • mts - AVCHD Video File
  • mj2 - Motion JPEG 2000 Video Clip
  • 3gp - 3GPP Multimedia File
  • 3g2 - 3GPP2 Multimedia File
  • m4v - iTunes Video File
  • m4p - iTunes Music Store Audio File
  • f4v - Flash MP4 Video File
  • f4p - Adobe Flash Protected Media File

Audio

  • mp1 - MPEG-1 Layer 1 Audio File
  • mp2 - MPEG Layer II Compressed Audio File
  • mp3 - MP3 Audio File
  • aac - Advanced Audio Coding File
  • ogg - Ogg Vorbis Audio File
  • oga - Ogg Vorbis Audio File
  • spx - Ogg Vorbis Speex File
  • opus - Opus Audio File
  • flac - Free Lossless Audio Codec File
  • wav - WAVE Audio File
  • mid - MIDI File
  • qcp - PureVoice Audio File
  • amr - Adaptive Multi-Rate Codec File
  • aif - Audio Interchange File Format
  • ape - Monkey's Audio Lossless Audio File
  • wv - WavPack Audio File
  • mpc - Musepack Compressed Audio File
  • dsf - Delusion Digital Sound File
  • voc - Creative Labs Audio File
  • ac3 - Audio Codec 3 File
  • m4a - MPEG-4 Audio File
  • m4b - MPEG-4 Audiobook File
  • f4a - Adobe Flash Protected Audio File
  • f4b - Extension Not Found
  • it - Impulse Tracker Module
  • s3m - ScreamTracker 3 Module
  • xm - Fasttracker 2 Extended Module

Font

  • ttf - TrueType Font
  • otf - OpenType Font
  • woff - Web Open Font Format File
  • woff2 - Web Open Font Format 2.0 File
  • eot - Embedded OpenType Font

Archive

  • zip - Zipped File
  • tar - Consolidated Unix File Archive
  • rar - WinRAR Compressed Archive
  • gz - Gnu Zipped Archive
  • bz2 - Bzip2 Compressed File
  • 7z - 7-Zip Compressed File
  • xz - XZ Compressed Archive
  • ar - Midtown Madness Data File
  • Z - Unix Compressed File
  • lz - Lzip Compressed File
  • cfb - Compound Binary File
  • cab - Windows Cabinet File
  • lzh - LZH Compressed File

Application

  • indd - Adobe InDesign Document
  • skp - SketchUp Document
  • blend - Blender 3D Data File
  • ics - Calendar File

Executable

  • exe - Windows Executable File
  • rpm - Red Hat Package Manager File
  • xpi - Cross-platform Installer Package
  • msi - Windows Installer Package
  • deb - Debian Software Package

Other

  • ogx - Ogg Vorbis Multiplexed Media File
  • swf - Shockwave Flash Movie
  • sqlite - SQLite Database File
  • nes - Nintendo (NES) ROM File
  • crx - Chrome Extension
  • mxf - Material Exchange Format File
  • wasm - WebAssembly Binary File
  • xml - XML File
  • glb - STK Globe File
  • pcap - Packet Capture Data
  • lnk - Windows Shortcut
  • alias - macOS Alias
  • mie - Meta Information Encapsulation
  • shp - Shapes File
  • arrow - Arrow Columnar Format
  • ps - PostScript File
  • eps - Encapsulated PostScript File
  • pgp - PGP Security Key
  • stl - Stereolithography File

Contributing

Benchmark

$ mix benchmark

Adding New File Type

Most files can be detected with a single binary pattern match. To contribute support for new file type:

  1. Find an example file. Please make sure you have the rights to use this file.
  2. Register the fixture in test/file_type/integration_test.exs.
  3. Write some code to detect the file's type in lib/file_type/magic.ex.
  4. Update the README to include a mention of your new file format.
  5. Send a pull request!

Please note that this library is not intended to detect text-based file formats like CSV, JSON, etc.

Prior Art