-
Notifications
You must be signed in to change notification settings - Fork 53
Structured Data API for Harvesting Crowdsourced Contributions
The following API supports harvesting crowdsourced contributions of structured data from FromThePage for integration with library, archive, museum and publishing systems.
Although crowdsourcing efforts in cultural heritage have proven successful, integrating crowd contributions back into institutional systems--archival finding aids, library catalog systems, museum collection management database, or digital edition publishing platforms--remains a challenge. With the support of the IIIF Consortium, a IIIF-based API was developed in 2016/2017 to support harvesting free-form transcription and translation data from FromThePage. However, this API is of limited use for institutions running field-based, spreadsheet-based, or metadata creation projects, since transcripts must be scraped for the structured data rendered within them.
In our experience, institutions using FromThePage's existing spreadsheet-based exports need the following:
- What the user-entered data is
- What kind of object the data represents (an individual page vs. metadata for an entire work)
- Which fields in the user-entered data correspond with which fields in their institutional systems
- Who created the data
- Any contextual information about the data's reliability, like user-created notes or items flagged for review
Exposing this data helps institutions evaluate quality and make the data usable, even after it has been exported.
Usage: {protocol}://{domain}/iiif/{work id}/structured/{page id}
The response contains the following elements:
The contributors
stanza contains an unordered array of users who have made substantial edits to the data (including edits and transcriptions but excluding approvals or notes). Each user element will contain a user_name
containing the pseudonym displayed on the system. User elements may contain real_name
and orcid
elements for contributor credit if those have been provided by the user.
The config
element of the top-level structured data response contains a URI which will fetch the project configuration used to create this data. This includes the types of data fields, their labels, any controlled vocabularies, and layout information.
The data
stanza of the response contains the actual data contributed by the users who have edited this page.
Each element of the data
array contains
-
label
: The human-readable label presented to the person who transcribed the field -
value
: The string value of the data -
config
: A URI representing the configuration for this particular field. (This can be used as an ID to map fields in a target system to fields in the FromThePage structured data response.)
The notes
stanza embeds any comments left by users creating the data. This element should not appear if no notes exist.
The on
stanza indicates the canvas or manifest corresponding to the page or work the data was created from. Canvas stanzas will contain a within
element with the @id
of the manifest containing the canvas.
The pageStatus
and workStatus
elements reflect the status of the work and page being fetched.
The the context
, profile
, label
and @id
elements work as normal in IIIF-based APIs
The above image was transcribed as part of the Indiana WWI Service Cards project, producing the following data:
{
"contributors":[
{
"userName":"geni"
}
],
"data":[
{
"label":"Last Name",
"value":"Gilbert",
"config":"http://localhost:3000/iiif/structured/config/field/460"
},
{
"label":"First Name",
"value":"Clifford",
"config":"http://localhost:3000/iiif/structured/config/field/461"
},
{
"label":"Middle Name",
"value":"O",
"config":"http://localhost:3000/iiif/structured/config/field/462"
},
{
"label":"Serial Number",
"value":"782884",
"config":"http://localhost:3000/iiif/structured/config/field/463"
},
{
"label":"Race",
"value":"Caucasian",
"config":"http://localhost:3000/iiif/structured/config/field/466"
},
{
"label":"Branch",
"value":"Army or Marines",
"config":"http://localhost:3000/iiif/structured/config/field/471"
},
{
"label":"Town or City of Residence",
"value":"Peru, Indiana",
"config":"http://localhost:3000/iiif/structured/config/field/467"
},
{
"label":"County of Residence",
"value":"",
"config":"http://localhost:3000/iiif/structured/config/field/473"
},
{
"label":"Place of Birth",
"value":"Peru, Indiana",
"config":"http://localhost:3000/iiif/structured/config/field/468"
},
{
"label":"Date of Birth",
"value":"",
"config":"http://localhost:3000/iiif/structured/config/field/469"
},
{
"label":"Age",
"value":"23 8/12",
"config":"http://localhost:3000/iiif/structured/config/field/470"
},
{
"label":"Is this card a reverse side? (Indicated by \"-B\")",
"value":"no",
"config":"http://localhost:3000/iiif/structured/config/field/472"
}
],
"config":"http://localhost:3000/iiif/246/structured/config/page",
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-endpoint-response",
"on":{
"@type":"sc:Canvas",
"@id":"http://localhost:3000/iiif/52679/canvas/1742382",
"within":"http://localhost:3000/iiif/52679/manifest"
},
"@id":"http://localhost:3000/iiif/52679/structured/1742382",
"label":"Structured data (field-based or spreadsheet transcriptions) for canvas",
"notes":"http://localhost:3000/iiif/1742382/list/notes",
"pageStatus":{
"@context":"http://www.fromthepage.org/jsonld/1/context.json",
"@id":"http://localhost:3000/iiif/52679/1742382/status",
"label":"Page Status",
"profile":"https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#service-1",
"pageStatus":[
"hasTranscript"
]
},
"workStatus":{
"@context":"http://www.fromthepage.org/jsonld/1/context.json",
"@id":"http://localhost:3000/iiif/52679/status",
"label":"Work Status",
"profile":"https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#service",
"pctComplete":100.0,
"pctTranscribed":100.0,
"pctOcrCorrected":0.0,
"pctIndexed":0,
"pctMarkedBlank":0,
"pctNeedsReview":0,
"pctTranslationComplete":0,
"pctTranslated":0,
"pctTranslationNeedsReview":0,
"pctTranslationIndexed":0,
"pctTranslationMarkedBlank":0,
"metadataStatus":"undescribed"
}
}
Usage: {protocol}://{domain}/iiif/{collection id}/structured/config/{level}
The configuration response returned from the config
URI of the structured data response represents the project configuration as an array of field configurations. Each field configuration element contains
-
@id
URI identifying the field. This URI is dereferenceable, and will fetch the configuration for the particular field. -
label
The label for the field presented to contributors -
row
The row on the data entry form on which this field should appear -
position
The position within the row on which this field should appear -
page
(optional) For multi-page forms, the page on which this row/field should appear -
input_type
The type of the field input; input types supported (as of 2022-01-17) include"text", "select", "date", "textarea", "description", "instruction", "spreadsheet","multiselect"
- Fields of input type
select
ormultiselect
may contain an elementoptions
containing an array of possible options for user selection. (Note that users may override the option list in some cases.) - Fields configured as spreadsheets will contain an additional stanza
spreadsheet_columns
, an array oflabel
,input_type
,position
and optionaloptions
elements, defining how each spreadsheet column is configured.
- Fields of input type
{
"@id":"http://localhost:3000/iiif/246/structured/config/page",
"label":"Transcription field configuration for Indiana World War I Service Record Cards",
"config":[
{
"label":"Last Name",
"input_type":"text",
"position":1,
"line":1,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/460"
},
{
"label":"First Name",
"input_type":"text",
"position":2,
"line":1,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/461"
},
{
"label":"Middle Name",
"input_type":"text",
"position":3,
"line":1,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/462"
},
{
"label":"Serial Number",
"input_type":"text",
"position":4,
"line":2,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/463"
},
{
"label":"Race",
"input_type":"select",
"position":5,
"line":2,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/466",
"options":[
"Caucasian",
"African American",
"Other",
"Not Given"
]
},
{
"label":"Town or City of Residence",
"input_type":"text",
"position":7,
"line":3,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/467"
},
{
"label":"Place of Birth",
"input_type":"text",
"position":9,
"line":4,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/468"
},
{
"label":"Date of Birth",
"input_type":"text",
"position":10,
"line":4,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/469"
},
{
"label":"Age",
"input_type":"text",
"position":11,
"line":4,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/470"
},
{
"label":"Branch",
"input_type":"select",
"position":6,
"line":2,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/471",
"options":[
"Army or Marines",
"Navy",
"Coast Guard",
"Nurse"
]
},
{
"label":"Is this card a reverse side? (Indicated by \"-B\")",
"input_type":"select",
"position":12,
"line":5,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/472",
"options":[
"no",
"yes"
]
},
{
"label":"County of Residence",
"input_type":"text",
"position":8,
"line":3,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/473"
}
],
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-configuration-response"
}
Dereferencing an individual field configuration will fetch an object identical to the object appearing in the project-wide configuration response, with the addition of a within
element containing the project configuration URI.
Usage: {protocol}://{domain}/iiif/structured/config/field/{field id}
{
"label":"Branch",
"input_type":"select",
"position":6,
"line":2,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-field-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/field/471",
"options":[
"Army or Marines",
"Navy",
"Coast Guard",
"Nurse"
],
"within":"http://localhost:3000/iiif/246/structured/config/page"
}
{
"label":"Persons Names (LN,FN)",
"input_type":"text",
"position":1,
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-spreadsheet-column-configuration-response",
"@id":"http://localhost:3000/iiif/structured/config/column/6",
"within":"http://localhost:3000/iiif/structured/config/field/3060"
}
References to the structured data response are embedded within a manifest in a seeAlso
block in the canvas (for field-based/spreadsheet transcription projects) or in the manifest itself (for item metadata creation projects).
{
"@context": "http://iiif.io/api/presentation/2/context.json",
"@id": "http://localhost:3000/iiif/52679/manifest",
"@type": "sc:Manifest",
"label": "IN WWI Service Record Cards Army and Marine GIL-GOF",
...
"sequences": [
{
"@id": "http://localhost:3000/iiif/52679/sequence/default",
"@type": "sc:Sequence",
...
"canvases": [
...
{
"@id": "http://localhost:3000/iiif/52679/canvas/1742382",
"@type": "sc:Canvas",
"label": "WWI0000932-A",
...
"seeAlso": [
...
{
"@id": "http://localhost:3000/iiif/52679/structured/1742382",
"label": "Structured data (field-based or spreadsheet transcriptions) for canvas",
"format": "application/ld+json",
"@context": "http://www.fromthepage.org/jsonld/structured/1/context.json",
"profile": "https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#structured-data-service"
}
],
Spreadsheet-based transcription projects are a subset of structured data projects. Their data response is slightly more complex than field-based projects.
Structured Data Endpoint Response(subset)
{
"contributors":[
{
"userName":"heidimarie"
}
],
"data":[
{
"label":"County",
"value":"Pasquotank County",
"config":"http://localhost:3000/iiif/structured/config/field/3056"
},
{
"label":"Day",
"value":"",
"config":"http://localhost:3000/iiif/structured/config/field/3057"
},
{
"label":"Month",
"value":"",
"config":"http://localhost:3000/iiif/structured/config/field/3058"
},
{
"label":"Year",
"value":"1769",
"config":"http://localhost:3000/iiif/structured/config/field/3059"
},
{
"data":[
[
{
"label":"Persons Names (LN,FN)",
"value":"Brought Forward",
"config":"http://localhost:3000/iiif/structured/config/column/6"
},
{
"label":"Whites",
"value":"940",
"config":"http://localhost:3000/iiif/structured/config/column/7"
},
{
"label":"Black Males",
"value":"506",
"config":"http://localhost:3000/iiif/structured/config/column/8"
},
{
"label":"Black Females",
"value":"249",
"config":"http://localhost:3000/iiif/structured/config/column/9"
},
{
"label":"Total",
"value":"1695",
"config":"http://localhost:3000/iiif/structured/config/column/10"
}
],
[
{
"label":"Persons Names (LN,FN)",
"value":"Williams, Williss",
"config":"http://localhost:3000/iiif/structured/config/column/6"
},
{
"label":"Whites",
"value":"3",
"config":"http://localhost:3000/iiif/structured/config/column/7"
},
{
"label":"Total",
"value":"3",
"config":"http://localhost:3000/iiif/structured/config/column/10"
}
],
[
{
"label":"Persons Names (LN,FN)",
"value":"Williams, [Elhmey?]",
"config":"http://localhost:3000/iiif/structured/config/column/6"
},
{
"label":"Whites",
"value":"1",
"config":"http://localhost:3000/iiif/structured/config/column/7"
},
{
"label":"Total",
"value":"1",
"config":"http://localhost:3000/iiif/structured/config/column/10"
}
],
[
{
"label":"Persons Names (LN,FN)",
"value":"[Wooldudge?], John",
"config":"http://localhost:3000/iiif/structured/config/column/6"
},
{
"label":"Whites",
"value":"1",
"config":"http://localhost:3000/iiif/structured/config/column/7"
},
{
"label":"Total",
"value":"1",
"config":"http://localhost:3000/iiif/structured/config/column/10"
}
],
"config":"http://localhost:3000/iiif/structured/config/field/3060"
}
],
"config":"http://localhost:3000/iiif/1195/structured/config/page",
"profile":"https://github.com/benwbrum/fromthepage/wiki/Structured-Data-API-for-Harvesting-Crowdsourced-Contributions#structured-data-endpoint-response",
"on":{
"@type":"sc:Canvas",
"@id":"http://localhost:3000/iiif/57901/canvas/1832940",
"within":"http://localhost:3000/iiif/57901/manifest"
},
"@id":"http://localhost:3000/iiif/57901/structured/1832940",
"label":"Structured data (field-based or spreadsheet transcriptions) for canvas",
"pageStatus":{
"@context":"http://www.fromthepage.org/jsonld/1/context.json",
"@id":"http://localhost:3000/iiif/57901/1832940/status",
"label":"Page Status",
"profile":"https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#service-1",
"pageStatus":[
"needsReview",
"hasTranscript"
]
},
"workStatus":{
"@context":"http://www.fromthepage.org/jsonld/1/context.json",
"@id":"http://localhost:3000/iiif/57901/status",
"label":"Work Status",
"profile":"https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#service",
"pctComplete":0,
"pctTranscribed":0,
"pctOcrCorrected":0.0,
"pctIndexed":0,
"pctMarkedBlank":0,
"pctNeedsReview":100.0,
"pctTranslationComplete":0,
"pctTranslated":0,
"pctTranslationNeedsReview":0,
"pctTranslationIndexed":0,
"pctTranslationMarkedBlank":0,
"metadataStatus":"undescribed"
}
}
This work would not have been possible without the collaboration of Nicholas ver Steegh (Ohio University)