The files here provide samples, in various formats, character encodings, and normalizations -- of different types of data provided in MARC. Datasets provided here come from a range of sources:
- Harvards Open data: https://library.harvard.edu/services-tools/harvard-library-apis-datasets
- Library of Congress id.loc.gov: http://id.loc.gov
- OCLC: http://oclc.org
- BNF: http://bnf.fr/
- My personal collection of test records (pulled from work with libraries around the world)
- MarcEdit users
Additional data sets created by others, can be found at:
- https://github.com/DLFMetadataAssessment/2018MetadataAnalysisWorkshop
- https://github.com/kateefly/ToolsDataSet
- https://librarycarpentry.org/lc-open-refine/
These data sets are created for Educational purposes, to provide those teaching metadata or cataloging workflows or classes, ready made data sets that can be used as examples. These should not be used for production cataloging because many of these records:
- Are older (because pre-RDA data is useful for teaching)
- Have a range of different licenses when used outside of an educational context
If you have an interesting data set that you'd like to see added -- feel free to pass it on or submit a pull request. If you have questions about the sets made available, or would like to request a specific type of data, let me know.