-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing - Authority record samples #1431
Comments
Documenting here differences in ebsco loaded authority records vs. data import Jeremy posted a record using the ebsco tools to both /authority-storage/authorities and SRS
I loaded the same record to data import with the default authorities profile and did a
Linking worked for the data import record and the ebsco loaded record. |
The $t is being ignored in the ebsco handling of 4xx fields. There is an open ticket from 2022 in the folio-migration-tools to support requiredSubfield I haven't figured out exactly what is going on with the ID differences but one is that the migration tools are concatenating the 001 and 003 Something is missing regarding the 008 handling as well. |
Findings outlined here: https://docs.google.com/document/d/1W1oqZqhWcw7JZEeSCg6rKsJys6BWId5ftKJ2e6D2Inw/edit?tab=t.0#bookmark=kix.337n8d4ieos7 I think our options are
@jermnelson what are your thoughts on the above? |
Hi Alissa, I believe a combination of 1 and 3 could be an option. In an Airflow DAG we would do the following tasks:
|
Thanks for thinking this through @jermnelson. Couple questions off the top of my head for steps 2-3
|
Since Ebsco's AuthorityTransformer uses FOLIO Authority Mappings to generate the Authority JSON record (although as we've discovered the Ebsco mapping code doesn't support all of the different options in the FOLIO mapping like requiredSubfield), I think the mapped MARC data should be present in the Authority record. I'm working on a report that takes the sample Authority MARC records and using the AuthorityTransformer, create Authority JSON records and then compares the FOLIO Mapping to see if the expected fields are present in the records.
I'm not sure how to approach this question without having the corresponding Authority records created by data import to compare with the records created by the AuthorityTransformer. I'll do some more analysis and maybe see if I can extend the reporting from the previous question to this question. |
Great, thanks Jeremy! Let me know if you want a number of records loaded through Data Import to compare. |
Here is a spreadsheet that has the Authority Mapping MARC tags* as columns. Each row is a single authority record and the if the MARC field is present and has a value, it is set to True, if the value is missing, the field is set to False. If the tag isn't in the MARC record, a blank (null) value is record. From this analysis, the Ebsco AuthorityMapper always generates a value if a tag is present in the MARC record. *The Ebsco AuthorityMapper class has special handling for the |
@jermnelson thanks for doing this! I've reviewed the spreadsheet and agree all looks good - confirms all data would be present in the json with exceptions for 008 and 001 that will need to be addressed. My second question was about how we can best understand the differences in transformation happening in data import vs. the ebsco Authority Transformer. You said
I've loaded the same file authkey.sample3.mrk to -stage. Any thoughts on how we could compare the json from each process? Or do you think there is a better approach to take to try and figure out if there are additional differences not discovered yet? Once we identify them do you think they should be ticketed in the FSE project? |
For the first pass and analyzing the differences between the Authority records produced by Ebsco tools and Data Import, I generated the following table comparing the
|
In the For example, the MARC record with
The Authority Record generated by the Ebsco Tools only creates the following identifier that combines the values of subfield
The Data Import generated Authority Record creates separate identifiers fields for each of the subfield
|
Fields in the Data Import record that are not in the Ebsco Tools records
|
Thanks for all of this analysis Jeremy! I'm glad to see there weren't any additional gotchas (except for the 010 subfield concatenation). Are you feeling confident that we could
If so let's chat next week to at least get some of this ticketed before winter break. |
FSE community has recommended a different path - see slack responses. Jeremy has test loaded a few authority records using https://github.com/FOLIO-FSE/folio_data_import to folio-test. I looked at one record and all looks good. Default auth DI profile was used. We would like to test load a larger file to get a sense of timings. @jermnelson you can find those here authkey.de/dd I'll find some bibs to test with and write that up in #1432. |
@jermnelson reports that 50k authority records took 6 hours. |
@jermnelson wrong PR is linked here |
Thanks for the correction! |
@jermnelson what are your thoughts on next steps here? I assume we should wait till Q is up on test/stage and then try to improve the loading times? Currently what I have recorded is 50k authority records took 6 hours. |
Some records have been loaded to stage already
Sample files here
The text was updated successfully, but these errors were encountered: