-
Notifications
You must be signed in to change notification settings - Fork 53
Structured Data API for Harvesting Crowdsourced Contributions
The following API supports harvesting crowdsourced contributions of structured data from FromThePage for integration with library, archive, museum and publishing systems.
Although crowdsourcing efforts in cultural heritage have proven successful, integrating crowd contributions back into institutional systems--archival finding aids, library catalog systems, museum collection management database, or digital edition publishing platforms--remains a challenge. With the support of the IIIF Consortium, a IIIF-based API was developed in 2016/2017 to support harvesting free-form transcription and translation data from FromThePage. However, this API is of limited use for institutions running field-based, spreadsheet-based, or metadata creation projects, since transcripts must be scraped for the structured data rendered within them.
In our experience, institutions using FromThePage's existing spreadsheet-based exports need the following:
- What the user-entered data is
- What kind of object the data represents (an individual page vs. metadata for an entire work)
- Which fields in the user-entered data correspond with which fields in their institutional systems
- Who created the data
- Any contextual information about the data's reliability, like user-created notes or items flagged for review
Exposing this data helps institutions evaluate quality and make the data usable, even after it has been exported.
The response contains the following elements:
The contributors
stanza contains an unordered array of users who have made substantial edits to the data (including edits and transcriptions but excluding approvals or notes). Each user element will contain a user_name
containing the pseudonym displayed on the system. User elements may contain real_name
and orcid
elements for contributor credit if those have been provided by the user.
The configuration
stanza of the response represents the project configuration as an array of field configurations. Each field configuration element contains
-
label
The label for the field presented to contributors -
row
The row on the data entry form on which this field should appear -
position
The position within the row on which this field should appear -
page
(optional) For multi-page forms, the page on which this row/field should appear -
input_type
The type of the field input; input types supported (as of 2022-01-17) include"text", "select", "date", "textarea", "description", "instruction", "spreadsheet","multiselect"
- Fields of input type
select
ormultiselect
may contain an elementoptions
containing an array of possible options for user selection. (Note that users may override the option list in some cases.) - Fields configured as spreadsheets will contain an additional stanza
spreadsheet_columns
, an array oflabel
,input_type
,position
and optionaloptions
elements, defining how each spreadsheet column is configured.
- Fields of input type
The data
stanza of the response contains the actual data contributed by the users who have edited this page.
The notes
stanza embeds any notes left on by users creating the data.
The on
stanza indicates the canvas or manifest corresponding to the page or work the data was created from. Canvas stanzas will contain a within
element with the @id
of the manifest containing the canvas.
The the context
, profile
, label
and @id
elements work as normal in IIIF-based APIs
The `
{
"contributors":[
{
"user_name":"geni"
}
],
"configuration":[
{
"label":"Last Name",
"input_type":"text",
"position":1,
"line":1
},
{
"label":"First Name",
"input_type":"text",
"position":2,
"line":1
},
{
"label":"Middle Name",
"input_type":"text",
"position":3,
"line":1
},
{
"label":"Serial Number",
"input_type":"text",
"position":4,
"line":2
},
{
"label":"Race",
"input_type":"select",
"position":5,
"line":2,
"options":[
"Caucasian",
"African American",
"Other",
"Not Given"
]
},
{
"label":"Town or City of Residence",
"input_type":"text",
"position":7,
"line":3
},
{
"label":"Place of Birth",
"input_type":"text",
"position":9,
"line":4
},
{
"label":"Date of Birth",
"input_type":"text",
"position":10,
"line":4
},
{
"label":"Age",
"input_type":"text",
"position":11,
"line":4
},
{
"label":"Branch",
"input_type":"select",
"position":6,
"line":2,
"options":[
"Army or Marines",
"Navy",
"Coast Guard",
"Nurse"
]
},
{
"label":"Is this card a reverse side? (Indicated by \"-B\")",
"input_type":"select",
"position":12,
"line":5,
"options":[
"no",
"yes"
]
},
{
"label":"County of Residence",
"input_type":"text",
"position":8,
"line":3
}
],
"data":[
{
"row":1,
"label":"Last Name",
"value":"Gilbert"
},
{
"row":1,
"label":"First Name",
"value":"Clifford"
},
{
"row":1,
"label":"Middle Name",
"value":"O"
},
{
"row":1,
"label":"Serial Number",
"value":"782884"
},
{
"row":1,
"label":"Race",
"value":"Caucasian"
},
{
"row":1,
"label":"Branch",
"value":"Army or Marines"
},
{
"row":1,
"label":"Town or City of Residence",
"value":"Peru, Indiana"
},
{
"row":1,
"label":"County of Residence",
"value":""
},
{
"row":1,
"label":"Place of Birth",
"value":"Peru, Indiana"
},
{
"row":1,
"label":"Date of Birth",
"value":""
},
{
"row":1,
"label":"Age",
"value":"23 8/12"
},
{
"row":1,
"label":"Is this card a reverse side? (Indicated by \"-B\")",
"value":"no"
}
],
"notes":null,
"on":{
"@type":"sc:Canvas",
"@id":"http://localhost:3000/iiif/52679/canvas/1742382",
"within":"http://localhost:3000/iiif/52679/manifest"
},
"@id":"http://localhost:3000/iiif/52679/structured/1742382",
"label":"Structured data (field-based or spreadsheet transcriptions) for canvas"
}
References to the structured data response are embedded within
{
"@context": "http://iiif.io/api/presentation/2/context.json",
"@id": "http://localhost:3000/iiif/52679/manifest",
"@type": "sc:Manifest",
"label": "IN WWI Service Record Cards Army and Marine GIL-GOF",
...
"sequences": [
{
"@id": "http://localhost:3000/iiif/52679/sequence/default",
"@type": "sc:Sequence",
...
"canvases": [
...
{
"@id": "http://localhost:3000/iiif/52679/canvas/1742382",
"@type": "sc:Canvas",
"label": "WWI0000932-A",
...
"seeAlso": [
...
{
"@id": "http://localhost:3000/iiif/52679/structured/1742382",
"label": "Structured data (field-based or spreadsheet transcriptions) for canvas",
"format": "application/ld+json",
"@context": "http://www.fromthepage.org/jsonld/structured/1/context.json",
"profile": "https://github.com/benwbrum/fromthepage/wiki/FromThePage-Support-for-the-IIIF-Presentation-API-and-Web-Annotations#structured-data-service"
}
],