Skip to content

Extract and transform ETD records for OCLC

carakey edited this page Mar 8, 2019 · 1 revision

Background

Once upon a time, catalogers manually copied and pasted Electronic Thesis and Dissertation (ETD) field values into MARC to create descriptive records for loading in OCLC. Collaborating with cataloging staff, we can streamline this task with a defined ETL (Extract, Transform, Load) process.

This effort is currently in a requirements-gathering phase.

Extraction

Descriptive metadata for ETDs will be exported by ETS using SOLR.
The exported data must include the web URL for the item.
Labels need to be output for all URIs.

Outstanding:

  • Data selection: will specifying type + date in a SOLR query get everything?
  • Data validation: is there a secondary source to check against?
  • Output format: what is catalogers' preference?
  • Handling special cases:
    • Embargoes: are they included in the export before they are available?
    • Child works: are they included in the export, or only the parent? If not included, is there information about the children that is valuable and needs to be included?

Transformation

Outstanding:

  • Responsibility: ETS or cataloging?
  • Method: XSL if ETS, MARCEdit if cataloging
  • Output format: MARCXML? what is catalogers' preference?
  • Crosswalking: do templates exist?
  • Scripts and stylesheets: TBD

Load

MARC records will be loaded to OCLC by cataloging staff.

Outstanding:

  • How is cleanup work such as subject analysis integrated?