Skip to content

Commit

Permalink
Merge pull request #10 from sul-dlss-labs/marcxml
Browse files Browse the repository at this point in the history
Add support for converting from MARCXML
  • Loading branch information
jacobthill authored Jan 17, 2024
2 parents e10a0af + 59e250a commit 814f534
Show file tree
Hide file tree
Showing 5 changed files with 786,431 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Build Status](https://github.com/edsu/marctable/actions/workflows/test.yml/badge.svg)](https://github.com/edsu/marctable/actions/workflows/test.yml)

*marctable* is a Python command line utility that converts MARC bibliographic data into tabular formats like [CSV] and [Parquet]. It uses the Library of Congress [MARC Bibliographic documentation] expressed as an [Avram] [JSON file] to determine what MARC fields and subfields to include and whether they can repeat or not.
*marctable* is a Python command line utility that converts MARC bibliographic data (in transmission format or MARCXML) into tabular formats like [CSV] and [Parquet]. It uses the Library of Congress [MARC Bibliographic documentation] expressed as an [Avram] [JSON file] to determine what MARC fields and subfields to include and whether they can repeat or not.

## Install

Expand Down
8 changes: 7 additions & 1 deletion marctable/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,14 @@ def records_iter(
mapping = _mapping(rules)
marc = MARC.from_avram()

# TODO: MARCXML parsing brings all the records into memory
if marc_input.name.endswith(".xml"):
reader = pymarc.marcxml.parse_xml_to_array(marc_input)
else:
reader = pymarc.MARCReader(marc_input)

rows = []
for record in pymarc.MARCReader(marc_input):
for record in reader:
# if pymarc can't make sense of a record it returns None
if record is None:
# TODO: log this?
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "marctable"
version = "0.3.2"
version = "0.4.0"
description = "Convert MARC to CSV and Parquet"
authors = ["Ed Summers <[email protected]>"]
license = "Apache"
Expand Down
Loading

0 comments on commit 814f534

Please sign in to comment.