forked from ld4l-labs/marc2rdf-validator
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathworks-algorithm-notes.txt
411 lines (265 loc) · 13.2 KB
/
works-algorithm-notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
Algorithm to Generate BibFrame *Works* from MaRC
reversed engineered from LoC xquery implementation of Marc -> BibFrame
https://github.com/lcnetdev/marc2bibframe
bf:Works are created for bib records by:
modules/module.MBIB-2-BIBFRAME-Shared.xqy
Each Bib Record creates a Work
PLUS
a Work is created when the Bib record has:
600t # subject
533 # related reproduction
555u # finding aid
# related works
730|740|780|785
787t
700t|710t|711t
490a
400t|410t|411t|430t|440t|534t|774t|775t|800t|810t|811t|830t
760|762|765|767|770|772|773|774
505 ind2==0, has |t # contents notes
130|240 # uniform title
880 for 130|240 vernacular uniform title
================================================================================
Naomi's notes below this line
Looking for code that outputs bf:Work
# Each Bib Record
function mbshared:generate-work
2607: let $mainType := "Work"
2780: element {fn:concat("bf:" , $mainType)} {
# 600t
function mbshared:get-subject
# It takes a specific 6xx as input.
# It generates a bf:subject as output, which can have a bf:subject/bf:Work if 600t
3018: let $subjectType:= if ($d[@tag="600"][marcxml:subfield[@code="t"]]) then "Work" else $subjectType
3148: element {fn:concat("bf:",$subjectType)} {
Called by:
2744: return mbshared:get-subject($d)
from function mbshared:generate-work
# 533 related repro
function mbshared:generate-related-reproduction
# 533 to reproduction
2205: element bf:Work{
Called by:
2504: return mbshared:generate-related-reproduction($d,$type)
from function mbshared:related-works
Called by:
# for instances - doesn't produce works
745: let $instance-relateds := mbshared:related-works($d/ancestor::marcxml:record,$workID,"instance")
2751: let $work-relateds := mbshared:related-works($marcxml,$workID,"work")
from function mbshared:generate-work
# 555u finding aid
function mbshared:generate-finding-aid-work
# 555 finding aids note may be related work link or a simple property
2294: element bf:Work{
Called by:
2748: mbshared:generate-finding-aid-work($d)
from function mbshared:generate-work
# related work
# 730|740|780|785
# 787t
# 700t|710t|711t
# 490a
# 400t|410t|411t|430t|440t|534t|774t|775t|800t|810t|811t|830t
# 760|762|765|767|770|772|773|774
function mbshared:generate-related-work
# For RDA: 040$e = rda
# For AACR2: Leader/18 = a
# Under AACR2, when two works were published together the first work in the compilation was given the 1XX/240, and the second work was given a 700 analytic (name/title). This essentially resulted in identifying the aggregate work by only the first work in the compilation.
# Under RDA, we identify the aggregate work in the 240 (not just one of the works), and provide analytical added entries (name/title) for the works in the compilation.
# (245 would be the instance title, 240 the UT)
2388: element bf:Work {
Called by:
2500: return mbshared:generate-related-work($d,$type, $workID) ) # 730|740|780|785
2508: return mbshared:generate-related-work($d,$type, $workID) # 787t
2512: return mbshared:generate-related-work($d,$type, $workID) # 700t|710t|711t
2516: return mbshared:generate-related-work($d,$type, $workID) # 490a
2520: return mbshared:generate-related-work($d,$type, $workID) # 400t|410t|411t|430t|440t|534t|774t|775t|800t|810t|811t|830t
2526: mbshared:generate-related-work($d,$type, $workID) # 760|762|765|767|770|772|773|774
all from mbshared:related-works
Called by
# for instances - doesn't produce works
745: let $instance-relateds := mbshared:related-works($d/ancestor::marcxml:record,$workID,"instance")
2751: let $work-relateds := mbshared:related-works($marcxml,$workID,"work")
from function mbshared:generate-work
# 505 ind2==0, has |t contents notes
function mbshared:generate-complex-notes
# This function generates constituent (hasPart) works from 505
# LC often uses only $a, but stanford has $g$t, and dave reser says he's seen [$g]$t[$r][$u] and $u is always last.
# It generates a bf:hasPart/bf:Work
2901: element bf:Work {
Called by
2755: return (mbshared:generate-complex-notes($d),
from function mbshared:generate-work
# 880 for 130|240 vern uniform title
function mbshared:generate-translationOf
# This function generates a related work (rda expression?), as translation of from the 100, 240.
# It takes a 130 or 240 element.
# It generates a bf:translationOf/bf:Work
3680: element bf:Work {
Called by
3871: mbshared:generate-translationOf($d)
from function mbshared:get-uniformTitle
Called by
2611: return mbshared:get-uniformTitle($d)
from function mbshared:generate-work
# 130|240 uniform title
function mbshared:get-uniformTitle
# This function generates a Work based on the uniformTitle.
# It takes a specific datafield (130 or 240) as input.
# It generates a bf:Work as output.
3888: element bf:Work {
Called by
2611: return mbshared:get-uniformTitle($d)
from function mbshared:generate-work
NOTE: switches:
-----------------
to generate instances from secondary ISBNs
declare variable $mbshared:transform-rules :=(
<rules>
<rule status="off" id="1" label="isbn" category="instance-splitting">New instances on secondary unique ISBNs</rule>
<rule status="on" id="2" label="issn" category="instance-splitting">New instances on secondary unique ISSNs</rule>
<rule status="on" id="3" label="260" category="instance-splitting">New instances on multiple 260s (not serials)</rule>
<rule status="on" id="4" label="250" category="instance-splitting">New instances on multiple 250s</rule>
<rule status="on" id="4" label="300" category="instance-splitting">New instances on multiple 300s</rule>
<rule status="on" id="5" label="246" category="instance-node">246 domain is instance</rule>
<rule status="on" id="6" label="247" category="instance-node">247 domain is instance</rule>
<!--<rule status="on" id="7" label="856" category="instance-splitting">New instances on secondary 856s that are resources (ind2=0)</rule>??? -->
</rules>
);
Works:
-----------------
mbshared:generate-work
from https://github.com/lcnetdev/marc2bibframe/blob/master/modules/module.MBIB-2-BIBFRAME-Shared.xqy#L2559-l2829
Every bib record generates a work.
If 555u, generate a finding aid work by calling mbshared:generate-finding-aid-work
mbshared:generate-finding-aid-work
mbshared:generate-related-work
mbshared:get-resourceTypes
https://github.com/lcnetdev/marc2bibframe/blob/master/modules/module.MBIB-2-BIBFRAME-Shared.xqy#L3493-L3538
leader|06
<type leader6="a">Text</type>
<type leader6="c">NotatedMusic</type>
<type leader6="d">NotatedMusic</type>
<type leader6="e">Cartography</type>
<type leader6="f">Cartography</type>
<type leader6="g">MovingImage</type>
<type leader6="i">Audio</type>
<type leader6="j">Audio</type>
<type leader6="k">StillImage</type>
<type leader6="m">Multimedia</type>
<type leader6="m">Dataset</type>
<type leader6="o">MixedMaterial</type>
<type leader6="p">MixedMaterial</type>
<type leader6="r">ThreeDimensionalObject</type>
<type leader6="t">Text</type>
007|00
t: Text
q: NotatedMusic
adr: return Cartography
m: return MovingImage
v: return MovingImage
s: Audio
o: MixedMaterial
f: Tactile
336a based on specific strings matched
<type sf336a="(text|tactile text)">Text</type>
<type sf336a="(notated music|tactile notated music)">NotatedMusic</type>
<type sf336a="(notated movement|tactile notated movement)">NotatedMovement</type>
<type sf336a="(cartographic dataset|cartographic image|cartographic moving image|cartographic tactile image|cartographic tactile three-dimensional form|cartographic three-dimensional form)">Cartography</type>
<type sf336a="(three-dimensional moving image|two-dimensional moving image|cartographic moving image)">MovingImage</type>
<type sf336a="(performed music|sounds|spoken word)">Audio</type>
<type sf336a="(still image|tactile image|cartographic image)">StillImage</type>
<type sf336a="computer program">Multimedia</type>
<type sf336a="(cartographic dataset|computer dataset)">Dataset</type>
<type sf336a="(three-dimensional form|tactile three-dimensional form|three-dimensional moving image| cartographic three dimensional form|cartographic tactile three dimensional form)">ThreeDimensionalObject</type>
<type sf336a="(cartographic tactile image|cartographic tactile three-dimensional form|tactile image|tactile notated music|tactile notated movement|tactile text|tactile three-dimensional form)">Tactile</type>
336b based on specific strings matched
<type sf336b="(txt|tct)">Text</type>
<type sf336b="(ntm|ccm)">NotatedMusic</type>`
<type sf336b="(ntv|tcn)">NotatedMovement</type>
<type sf336b="(tcrd|cri|crm|crt|crn|crf)">Cartography</type>
<type sf336b="(tdm|tdi)">MovingImage</type>
<type sf336b="(prm|snd|spw)">Audio</type>
<type sf336b="(sti|tci|cri)">StillImage</type>
<type sf336b="cop">Multimedia</type>
<type sf336b="(crd|cod)">Ddataset</type>
<type sf336b="(tdf|tcf|tcm|crf|crn )">ThreeDimensionalObject</type>
<type sf336b="(crt|crn|tci|tcm|tcn|tct|tcf)">Tactile</type>
337a based on specific strings matched
<type sf337a="audio">Audio</type>
337b based on specific strings matched
<type sf337b="s">Audio</type>
For Marc bib record:
if leader06 is
'a'
&& leader07 is 'a', 'c', 'd' or 'm': 008type = BK
&& leader7 is 'b', 'i' or 's': 008type = SE
't': 008type = BK
'p': 008type = MM
'm': 008type = CF
'e', 'f', 's': 008type = MP
'g', 'k', 'o', 'r': 008type = VM
'c', 'd', 'i', 'j': 008type = MU
generate instances based on 008type
generate instances from 856|859 if ind2 is '0' and sub 3 doesn't contain 'contributor'
types = mbshared:get-resourceTypes
uniformTitle if 130 or 240
names if 100, 110 or 111 OR ( 700,710,711,720 without |t)
titles = 243, 245, 247
label: first of:
uniformTitle[madsrdf:authoritativeLabel]
uniformTitle[bf:workTitle]
regular titles[bf:workTitle]
remove trailing period
bf:authorizedAccessPoint from label
events from 033
dissertations from 502
audience from 008|23 and 008type (BK|CF|MU|V)
audience from 521
genre from 008|24
genre from
Instances
-----------------
mbshared:generate-instances
from https://github.com/lcnetdev/marc2bibframe/blob/master/modules/module.MBIB-2-BIBFRAME-Shared.xqy#L2039-L2082
If we have an ISBN from 020a, then
isbn-sets = if 020a exists, we get sets of 020a (if it matches one of our ISBNs) and bf:isbn 10 and 13 chars
If we have an 020a with an ISBN
then look in 260, 261, 262, 264 or 300 to generate-instance-from260
else if we asked to generate instances from secondary unique ISBNS, call generate-instance-fromISBN
mbshared:generate-instance-from260
https://github.com/lcnetdev/marc2bibframe/blob/master/modules/module.MBIB-2-BIBFRAME-Shared.xqy#L523-L784
title from (245|246|247|222|242|210)
resp_statement from 245c
edition from 250
mbshared:generate-instance-fromISBN
L1480-L1654
for each ISBN pair, make an instance:
instanceOf work
title from 245, and info after 020a ISBN (volume or physical form or raw)
ISBN:
-----------------
mbshared:process-isbns
https://github.com/lcnetdev/marc2bibframe/blob/master/modules/module.MBIB-2-BIBFRAME-Shared.xqy#L2422-L2461
# This is the function that finds and dedups isbns, delivering a complete set for generate-instance from isbn
# If the 020$a has a 10 or 13, they are matched in a set, if the opposite of a pair doesn't exist, it is calculated
for 020a,
get string after parens (?), 'normalize' into pairs of equiv 10 and 13 digit ISBNs via call to mbshared:get-isbn
get unique 13 digit isbns
return bf:set* as element() containing both marcxml:subfield [code=a] (if it matches one of our ISBNs) and bf:isbn calculated nodes
mbshared:get-isbn
https://github.com/lcnetdev/marc2bibframe/blob/master/modules/module.MBIB-2-BIBFRAME-Shared.xqy#L3901-L3972
note: refers to http://www.isbn.org/converterpub.asp as comment
remove '-' and ' '
remove everything that isn't a digit at beginning and end
if length of string is 9, add X at end
if isbn is 10 chars, create the 13 char version (prepend 978, replace last digit with checksum)
if isbn is 13 chars, create the 10 char version (lop first 3 chars, replace last digit with checksum)
then If the 020$a has a 10 or 13, they are matched in a set, if the opposite of a pair doesn't exist, it is calculated
isbn-s
: If the 020$a has a 10 or 13, they are matched in a set, if the opposite of a pair doesn't exist, it is calculated
: This is the function that finds and dedups isbns, delivering a complete set for generate-instancefrom isbn
: If the 020$a has a 10 or 13, they are matched in a set, if the opposite of a pair doesn't exist, it is calculated
: @param $marcxml element is the MARCXML record
: @return wrap as as wrapper for bf:set* as element() containing both marcxml:subfield [code=a] and bf:isbn calculated nodes
https://github.com/lcnetdev/marc2bibframe/blob/master/modules/module.MBIB-2-BIBFRAME-Shared.xqy#L2422-L2461