forked from openpreserve/fido
-
Notifications
You must be signed in to change notification settings - Fork 0
Format Identification for Digital Objects
License
derekhiggins/fido
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
usage: fido.py [-h] [-v] [-q] [-recurse] [-zip] [-input INPUT] [-useformats INCLUDEPUIDS] [-nouseformats EXCLUDEPUIDS] [-matchprintf FORMATSTRING] [-nomatchprintf FORMATSTRING] [-bufsize BUFSIZE] [-show SHOW] [-loadformats XML1,...,XMLn] [-confdir CONFDIR] [-checkformats] [-convert] [-source SOURCE] [-target TARGET] [FILE [FILE ...]] Format Identification for Digital Objects (fido). FIDO is a command-line tool to identify the file formats of digital objects. It is designed for simple integration into automated work-flows. positional arguments: FILE files to check. If the file is -, then read content from stdin. In this case, python must be invoked with -u or it may convert the line terminators. optional arguments: -h, --help show this help message and exit -v show version information -q run (more) quietly -recurse recurse into subdirectories -zip recurse into zip and tar files -input INPUT file containing a list of files to check, one per line. - means stdin -useformats INCLUDEPUIDS comma separated string of formats to use in identification -nouseformats EXCLUDEPUIDS comma separated string of formats not to use in identification -matchprintf FORMATSTRING format string (Python style) to use on match. See nomatchprintf, README.txt. -nomatchprintf FORMATSTRING format string (Python style) to use if no match. See README.txt -bufsize BUFSIZE size of the buffer to match against -show SHOW show "format" or "defaults" -loadformats XML1,...,XMLn comma separated string of XML format files to add. -confdir CONFDIR configuration directory to load_fido_xml, for example, the format specifications from. -checkformats Check the supplied format XML files for quality. Open Planets Foundation (http://www.openplanetsfoundation.org) See License.txt for license information. Download from: http://github.com/openplanets/fido/downloads Author: Adam Farquhar, 2010 Maintainer: Maurice de Rooij, 2011 FIDO uses the UK National Archives (TNA) PRONOM File Format descriptions. PRONOM is available from www.tna.gov.uk/pronom. Installation ------------ Any platform 1. Download the latest zip release from http://github.com/openplanets/fido/downloads (or use the big Downloads button on http://github.com/openplanets/fido) 2. Unzip into some directory 3. Open a command shell, cd to the directory that you placed the zip contents into and cd into folder 'fido' 4. You should now be able to see the help text: python fido.py -h Dependencies ------------ Fido 0.9.6 and later will run on Python 2.6 or Python 2.7 with no other dependencies. Format Definitions ------------------ By default, Fido loads format information from two files conf/formats.xml and conf/format_extensions.xml. Addition format files can be specified using the -loadformats command line argument. They should use the same syntax as conf/format_extensions.xml. If more than one format file needs to be specified, then they should be comma separated as with the -formats argument. Output ------ Output is controlled with the two parameters matchprintf and nomatchprintf. Each is a string that may contain formating information. They have access to an object called info with the following fields: printmatch: info.version (file format version X), info.alias (format also called X), info.apple_uti (Apple Uniform Type Identifier), info.group_size and info.group_index (if a file has multiple (tentative) hits), info.count (file N) printnomatch: info.count (file N) The defaults for Fido 0.9.6 are: printmatch: "OK,%(info.time)s,%(info.puid)s,%(info.formatname)s,%(info.signaturename)s,%(info.filesize)s,\"%(info.filename)s\",\"%(info.mimetype)s\",\"%(info.matchtype)s\"\n" printnomatch: "KO,%(info.time)s,,,,%(info.filesize)s,\"%(info.filename)s\",,\"%(info.matchtype)s\"\n" It can be useful to provide an empty string for either, for example to ignore all failed matches, or all successful ones (see examples below). Note that a newline needs to be added to the end of the string using \n. Examples -------- Identify all files in the current directory and below, sending output into file-info.csv python fido.py -recurse . > file-info.csv Do the same as above, but also look inside of zip or tar files: python fido.py -recurse -zip . > file-info.csv Take input from a list of files: Linux: ls > files.txt python fido.py -input files.txt Windows: dir /b > files.txt python fido.py -input files.txt Take input from a pipe: Linux: find . -type f | python fido.py -input - Windows: dir /b | python fido.py -input - Only show files that could not be identified. python fido.py -matchprintf "" . Only show files that could be identified. python fido.py -nomatchprintf "" . License information ------------------- See the file "LICENSE.txt" for information on the history of this software, terms & conditions for usage, and a DISCLAIMER OF ALL WARRANTIES...
About
Format Identification for Digital Objects
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Python 99.4%
- Other 0.6%