Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

MARC #7

Open
ClareP opened this issue Nov 30, 2015 · 6 comments
Open

MARC #7

ClareP opened this issue Nov 30, 2015 · 6 comments

Comments

@ClareP
Copy link

ClareP commented Nov 30, 2015

Any chance of working with .mrc files? Or any compatibility with MARCedit?

@ostephens
Copy link
Contributor

OpenRefine doesn't work directly with native MARC files (i.e. .mrc files) but I've used it to work with files in the mnemonic/marcedit format

=245  14 $aThe Lord of the Rings

etc.

See posts at http://www.meanboyfriend.com/overdue_ideas/tag/fixmarc/?orderby=date&order=ASC for more on this

@chibiaeris
Copy link

Thanks, Owen!

There are a couple of things that I'd like to be able to do quickly in MarcEdit with OpenRefine or similar. At SOAS Library we regularly delete 653 and similar fields. It would be great to just be able to filter out all those pesky fields in a .mrc file and delete them wholesale.

Another thing: I work with records that use Asian (CJK) characters. This means using the 880 field to link transliterated fields to ones with indigenous script. It'd be useful to be able to generate those 880 fields in bulk when I'm writing bib records from scratch.

Lastly, many CJK records have Asian punctuation (e.g. 。、( );:instead of .,();:) which we usually change to the standard forms because they are a nightmare for indexing and retrieval. However, these symbols can sometimes be difficult to spot in MarcEdit, so it would be nice to filter down all these non-standard punctuation marks and replace them in one go.

Anyway, I don't know if any of this is helpful, but these would be the top things on my personal list for MarcEdit. :)

@ClareP
Copy link
Author

ClareP commented Dec 1, 2015

And heres my plight...We have spreadsheets created from data from our old LMS (Talis) with bib ids, author, title, old classification, new classification. They are based on ranges of local classmarks or 'old classificatons' so each sheet has 1000 or so rows with old classmarks between a range such as PK 74904 Cot to PK 76737 Sch (don't ask!). We want to somehow get the new classmark into 050_04 of the relevant Marc record in Alma and delete any existing 050s.
We don't have the mms ids in the spreadsheet and the old classmarks are in the holdings but not in the 050 fields of the records in Alma. We have added and deleted records from the ranges of classmarks since the spreadsheets were created so any range created and exported from Alma now will not match exactly against the reclassified ranges. We are using LCC as the basis for our new classmarks but we have classified each book individually to fit in with our existing collections and so anything that automatically pulls in LCC will not work for us. Any thoughts (even if that thought is: "not a snowball's chance in hell is there a way to automate this") gratefully received.

@ostephens
Copy link
Contributor

@chibiaeris I think that I'd probably go with trying to do this all in MarcEdit. I suspect that the MarcEdit Tasklist function would work for you here.

The Tasklist function allows you to create a list of functions as a list, and then carryout all the functions in the list with a single keystroke/click.

So for example the 'remove 653' fields - if you have a list of fields you always remove, create a tasklist that contains several 'delete field' tasks - one for each MARC tag you want to remove. You can then just run this task each time you open a new .mrc file in MarcEdit

In terms of adding the 880 I think you could also do this in MarcEdit, again as a task list. I guess the challenge here would be only adding it if it didn't already exist? Possibly the 'Build New Field' might be the way to go on this, or Copy field (or you might be able to do it all with a regular expression). I'm pretty sure this is possible, but would suggest asking on the MarcEdit email list (http://listserv.gmu.edu/cgi-bin/wa?A0=marcedit-l).

The last one it might be possible that OpenRefine could help you - one of the Custom Facets in OpenRefine is to create a facet based on Unicode values of characters in the column. This tells you how often each character occurs. If you know the range of unicode values that the special characters fall into, then you can narrow down to lines that contain those characters.

However, that said, if you are already working in MarcEdit with the record, switching to OpenRefine for this task could be a bit clunky. An alternative approach would be to build a set of 'find & replace' actions (one for each character) into a task list in MarcEdit - this would take some work up front, but once done you would be able to run it on all files.

@ostephens
Copy link
Contributor

@ClareP I probably don't know enough about the way Alma stores information to be able to answer this I'm afraid - but I'll have a stab at some general advice.

I'm not sure I quite get where the connections between the Alma records are and the information in your spreadsheets - are there shared identifiers, or is it a more general rule like 'if the Alma record has a holding where the class mark is in a range covered by our spreadsheets, then do the update'?

If there is some rule here, then I'd say you probably have a chance of automating, but if you can't write down a rule for doing any of the matching, it feels like this would be difficult or impossible.

Other advice:

  • Try to break the problem down into parts, work out how to do each step, then work out how to join the steps together
  • Don't get too distracted by 'edge cases' - that is the situations that the general rule won't cope with. Rather work on stuff that will fix the majority of your problems, and deal with the left overs separately

@ClareP
Copy link
Author

ClareP commented Dec 1, 2015

Hi Owen - this is great advice thanks
The general rule that you mention is exactly what we are trying to do: 'If the Alma record has a holding where the class mark is in a range covered by our spreadsheets, then do the update'
Unfortunately I think pinning down the shared identifier is the issue because I don't think there is one, at least not a reliable and unique one in the current data. Alma has MMS id, our spreadsheets have Bib IDs and although the Bib IDs are mostly present in the Alma records they are not in all of them and are unreliable as an ID. (although thinking about it - this might be an example of being distracted by the edge cases)
Yesterday's session and the advice on here have given me enough weapons to at least know what is possible once we have properly defined the problem domain and broken it down into the steps so I am hopeful that we should be able to come up with something. Will keep you posted, thanks again for everything

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants