-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify PICA serialization XML format #9
Comments
I'm not quite sure if even the pica2xml.pl script is up to date. I created that for a one-off PoC Leipzig University test. Right. My former colleague, Heikki, who originally started the work on this project was creating his own flavor of xml from the pica text files. When I took over this project, I built upon his foundation. I did experiment with PICA::DATA at that time, but I think it turns out to be a little easier to transform and harvest with this current format. NOTE: Our harvester does need to have certain delete signals and identifiers in the header-- so this format leans toward OAI-PMH. I think the files with no header node are defunct now. There should only be one format. I apologize for not cleaning-up old stuff. A record can have multiple level 1 data? We haven't come across this example when testing. Perhaps this is because this project only deals with single record updates. I'm not sure why we are not using x-occurrences. This is probably because item data is in its own repeatable node. An appropriate name for this format may be PICA FOLIO Import XML (PFIXML), or PIXML. |
Thanks for the quick answer!
So the current format is the second one with element
Yes, I think there is one FOLIO instance per level 1 identifier (ILN). The format could be extended to support more but if its use case does not need it, better keep it as it is.
x-occurrence would make sense if |
Well, you created yet another PICA+ serialization format, so I would like to add its documentation to http://format.gbv.de/pica and support it in PICA::Data (see gbv/PICA-Data#83).
As far as I understand the script, PICA+ records are first transformed to XML with
scripts/pica2xml.pl
. There are examples of this XML format inscripts/test
and intest
. As far as I could analyze it, the format includescollection
with (optional?) attributecount
record
header
with mandatory attributestatus
, having one of the valuesdeleted
orupsert
identifier
with the PPNmetadata
datafield
with attributestag
,fulltag
(mandatory) andoccurrence
(optional)subfield
with mandatory attributecode
item
with mandatory attributeepn
Some files use a slightly different form
collection
with (optional?) attributecount
record
status
having one of the valuesdeleted
orupsert
hrid
with the PPNmetadata
datafield
with attributestag
,fulltag
(mandatory) andoccurrence
(optional)subfield
with mandatory attributecode
item
with mandatory attributeepn
rawrecord
with full record (syntax of this is another issue)Questions:
datafield
anditem
be mixed or is the format limited to one ILN?x-occurrences
not included infulltag
(e.g. "209Ax00/01" for field 209Ax/01 with $x=00). For some fields on level 2 subfield$x
is crucial to distinguish the meaning of the field, see formal specification at https://format.gbv.de/schema/avram/specification#field-identifierThe text was updated successfully, but these errors were encountered: