It seems a recent libmagic regression (detected on Gentoo and Arch) is causing webm files to be incorrectly identified. If you have them in your mono-collection, it might be a good time to ask for a patrolling read against your by-id
index
Have received some complaints that the *nix binaries are built with WAY too new glibc. So they will now be built on latest release of Debian instead of bleeding edge Gentoo.
Breaking Changes
- Risk: moderate. Deprecated
source_*
parameters has been dropped- This affects qualifier expressions of all stages of the pipeline
- This also affects transform argument generation
- Risk: moderate. Store qualifiers and path generation no longer bind
file_*
attributes (except forfile_extension
)- Offering files to stores is a self contained process. Hoppers can be configured to auto invoke this process after certain files are ingested, but should not change said process. To convey extra information when auto invoked by hoppers is contrarian to this design
- If we need per-file attributes lets design it properly as opposed to hacking pieces of it onto two colocated features
New Features
- Added inline named capture groups support for regex
- Realized through the PCRE2 library
- Yes these are still applied at a lower precedence to named constants
- Yes this means we now support match specific group attributes
- Regex qualifiers now support minimum match length thresholds
- The new value for the include config directive is
PROPERTY /EXPRESSION/FLAGS THRESHOLD
- eg: require the expression match at least 50% of the value
include = x /\d+/ 50%
- eg: require the expression match at least 12 characters
include = x /\d+/ 12
- The new value for the include config directive is
Behavior Changes
- Workflows resumed through WIP files now bypass hopper evaluation
- WIP files now contain group attributes as well as workflow parameters, allowing manual touch ups
- Store qualifiers and path generation now bind
file_extension
from the file identification process instead of copied verbatim from the imported file's path - Order assignment now sorts all files by length then character codes
- This ensures semantically correct order for variable length numbers in file names: 0, 1, 10, 11, 2, 3 (the order without length factoring)
- Another happy coincidence is this tends to cluster together similarly named files
Performance
- Removed extraneous memory allocations from INI parsing
- Removed unnecessary memory allocations for attribute matching at the cost of a bit of short lived heap fragmentation
- Time complexity of matching files has been improved from
m log(n)
tom + n
Bug Fixes
- Reduced FFMPEG warning spam when dealing with JPEG files
- A side effect of this change is that phash has started producing slightly different results
- So do not be alarmed if you see a lot of phash corrections while patrolling
by-id