Skip to content

Executable jar to turn alma's stupid .tar.gz files into jsonl.gz

Notifications You must be signed in to change notification settings

mlibrary/alma.tar.gz-to-marcinjson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alma.tar.gz-to-marcinjson

Turn alma marc-xml export files into nicer marc-in-json jsonl files

Alma exports marc-xml files as a bunch of <whatever>.tar.gz files, each of which has the single file <whatever>.xml in it.

This is code that creates a fat .jar (i.e., all dependencies included) that will take any number of <whatever>.tar.gz files and produce <whatever>.jsonl.gz files in the directory you invoked the program from.

Usage

This is an executable .jar file that only takes filenames to convert as arguments.

java -jar /path/to/alma.tar.gz-to-marcinxml /path/to/alma/*.tar.gz

Building the .jar file

mvn package assembly:single

Performance

It's not ridiculously fast (e.g., it doesn't use Jackson custom serializers like it should and isn't even multi-threaded), but it'll convert the University of Michigan's full export of some 14.5M records on my laptop in about 15mn.

About

Executable jar to turn alma's stupid .tar.gz files into jsonl.gz

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages