Skip to content

stoffelx/concordanceprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Perl script for application of concordance lists to process bibliographic metadata

Background and Introduction

Sometimes you might not have more at hand than a delimiter-separated value list containing significant changes for machine-readable bibliographic metadata.

Specifically, this script was created to address a provider platform change, which resulted in an altered URL syntax to retrieve the respective full texts. Thereby also the IDs within the URLs changed. Based upon a concordance list of the corresponding IDs the pre-existing sets needed to be preocessed to match the new URL schema. Old and new URL schemes themselves are encoded in the script. The programme hab been written to be applied while batch processing for an ALEPH import.

Usage

All parameter variables being applied are encoded in source here. These are *$inputfile* for the filepath of concordance list, *$delimiter* to specify the value delimiter used in this list, *$processfile* for the path of file the manipulations are applied to and *$linematch* as identifier of the respective MARC/MAB data field. Futhermore the specific ID syntax (which addresses URLs here) is implemented to a hash for assignment of old and new values. This means the current version is highly specific and must be adapted to the respective use case.

As a result the script will deliver two output files (.sed / .rej) containig the corrected datasets and those, which don't match the given identifiers, respectively.

Known issues

- in the current version the script parameters are encoded in source code, i.e. the file to be processed, the concordance list, its value delimiter (and many more variables) cannot be handed over as command line parameter when the script is started from shell

- script performance isn't outstanding - so be patient with larger metadata sets

About

Fix bibliographic metadata based on a concordance list

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages