Skip to content
Heidi Frank edited this page Feb 5, 2014 · 6 revisions

KARMS Workflow for processing ACO MARCXML records:

  • Retrieve individual MARCXML files from "work" folder on github and copy to folder named [003]_[batchDate]/[003]_[batchDate]_marc_files/ (e.g., COO_20131122/COO_20131122_marc_files/)
  • Convert individual MARCXML files to individual .mrc files using MarcEdit (copied to folder [003]_[batchDate]/[003]_[batchDate]_marc_files/processed_files)
  • Join individual .mrc files into a single .mrc file using MarcEdit - save to folder/filename = [003]_[batchDate]/[003]_[batchDate]_all.mrc
  • Convert single .mrc file to eye-readable .mrk file using MarcEdit
  • Separate single .mrk file into 2 separate files - one containing records with OCLC numbers, the other containing records without OCLC numbers
  • Extract OCLC numbers from first .mrk file using MarcEdit and save to a .txt file with filename =[003]_[batchDate]/[003]_[batchDate]_all_OCLCnums.txt
  • Batch export all of the matching OCLC records from OCLC Connexion, saving to filename = [003]_[batchDate]/[003]_[batchDate]_OCLC_all.mrc
  • Batch export only the linked matching OCLC records from OCLC Connexion, saving to filename = [003]_[batchDate]/[003]_[batchDate]_OCLC_linked1.mrc
  • Dedup the "all" versus "linked1" records to get a list of "unlinked" records using Python script, generating filenames = [003]_[batchDate]/[003]_[batchDate]_OCLC_unlinked.mrc and [003]_[batchDate]/[003]_[batchDate]_OCLCnums_unlinked.txt
  • Look up each of the OCLC numbers in the unlinked.txt file and link/fix them in OCLC Connexion
  • Re-export the fixed set of "unlinked" OCLC records from OCLC Connexion, saving to filename = [003]_[batchDate]/[003]_[batchDate]_OCLC_linked2.mrc
  • Combine the linked1 and linked2 sets of records using MarcEdit Join function, saving to filename = [003]_[batchDate]/[003]_[batchDate]_OCLC_linked_all.mrc
  • Compare the quality of the original records from the partner (filename: [003]_[batchDate]_all.mrc) to the matched OCLC records (filename: [003]_[batchDate]_OCLC_linked_all.mrc)

Sample File Structure:

  • COO_20131203
    • COO_20131203_marc_files
      • processed_files
        • columbia_12345_marcxml.mrc
        • columbia_12346_marcxml.mrc
        • columbia_12347_marcxml.mrc
      • columbia_12345_marcxml.xml
      • columbia_12346_marcxml.xml
      • columbia_12347_marcxml.xml
    • COO_20131203_all.mrc
    • COO_20131203_all.mrk
    • COO_20131203_all_analysis.xlsx
    • COO_20131203_all_fieldcount.txt
    • COO_20131203_all_OCLCnums.txt
    • COO_20131203_OCLC_all.dat
    • COO_20131203_OCLC_all.mrc
    • COO_20131203_OCLC_all.mrk
    • COO_20131203_OCLC_linked1.dat
    • COO_20131203_OCLC_linked1.mrc
    • COO_20131203_OCLC_linked1.mrk
    • COO_20131203_OCLC_linked2.mrc
    • COO_20131203_OCLC_linked2.mrc
    • COO_20131203_OCLC_unlinked.mrc
    • COO_20131203_OCLCnums_unlinked.txt
Clone this wiki locally