Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

264 dcat us writer required fields #274

Merged
merged 18 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions DCAT-US.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# DCAT-US - mdTranslator proposed mappings
## Quick references
- DCAT-US [element definitions](https://resources.data.gov/resources/dcat-us/)
- DCAT-US v1.1 [catalog.json schema](https://resources.data.gov/schemas/dcat-us/v1.1/schema/catalog.json)
- DCAT-US v1.1 [dataset.json schema](https://resources.data.gov/schemas/dcat-us/v1.1/schema/dataset.json)
- DCAT-US v1.1 [JSON-LD catalog.json schema](https://resources.data.gov/schemas/dcat-us/v1.1/schema/catalog.jsonld)
- [Element crosswalks](https://resources.data.gov/resources/podm-field-mapping/#field-mappings) to other standards

## DCAT-US - mdTranslator

### Always (always required)

| Field Name | DCAT Name | Condition | mdJson Source |
| --- | --- | --- | --- |
| Title | dcat:title | exists | citation.title |
| Description | dcat:description | exists | citation.abstract |
| Tags | dcat:keyword | exists | [resourceInfo.keyword.keyword[0, n] *flatten*] |
| Last Update | dcat:modified | if resourceInfo.citation.date[any].dateType = "lastUpdated" or "lastRevised" or "revision" | resourceInfo.citation.date[most recent] |
| Publisher | dcat:publisher{name} | if citation.responsibleParty.[any].role = "publisher" | contactId -> contact.name where isOrganization IS TRUE |
| | | if exists resourceDistribution.distributor.contact | [first contact] contactId -> contact.name where isOrganization IS TRUE |
| Publisher Parent Organization | dcat:publisher{subOrganizationOf} | if citation.responsibleParty[any].role = "publisher" and exists contactId -> memberOfOrganization[0] and isOrganization is true | contactId -> contact.name |
| | | if exists resourceDistribution.distributor.contact and exists contactId -> memberOfOrganization[0] and isOrganization IS TRUE | contactId -> contact.name |
| Contact Name | dcat:contactPoint{fn} | exists | resourceInfo.pointOfContact.parties[0].contactId -> contact.name |
| Contact Email | dcat:contactPoint{email} | exists | resourceInfo.pointOfContact.parties[0].contactId -> contact.eMailList[0] |
| Unique Identifier | dcat:identifier | if resourceInfo.citation.identifier.namespace = "DOI" | resourceInfo.citation.onlineResource.uri |
| | | if "DOI" within resourceInfo.citation.onlineResource.uri | resourceInfo.citation.onlineResource.uri |
| Public Access Level | dcat:accessLevel | [*extend codelist MD_RestrictionCode to include "public", "restricted public", "non-public"*] <br> if resourceInfo.constraints.legal[any] one of {"public", "restricted public", "non-public"} | resourceInfo.constraints.legal[first]. Also resourceInfo.constraint.security.classification [[MD_ClassificationCode](https://mdtools.adiwg.org/#codes-page?c=iso_classification)] |
| Bureau Code | dcat:bureauCode | | [*extend role codelist to include "bureau", extend namespace codelist to include "bureauCode"*] <br> for each resourceInfo.citation.responsibleParty[any] role = "bureau" <br>contactId -> contact.identifier [*identifier must conform to https://resources.data.gov/schemas/dcat-us/v1.1/omb_bureau_codes.csv*] |
| Program Code | dcat:programCode | | [*add new element of program resourceInfo.programCode, add new codelist of programCode*] <br> resourceInfo.program[0,n] |

### If-Applicable (required if it exists)

| Field Name | DCAT Name | Condition | mdJson Source |
| --- | --- | --- | --- |
| Distribution | dcat:distribution | if exists resourceDistribution[any] and if exists resourceDistribution.distributor[any].transferOption[any].onlineOption[any].uri <br> for each resourceDistribution[0, n] where exists resourceDistribution.distributor.transferOption.onlineOption.uri then <br> {description, accessURL, downloadURL, mediaType, title} |
| - Description | dcat:distribution.description | exists | resourceDistribution.description |
| - AccessURL | dcat:distribution.accessURL | if citation.onlineResources[first occurence].uri [path ends in ".html"] [*required if applicable*] | resourceDistribution.distributor.transferOption.onlineOption.uri |
| - DownloadURL | dcat.distribution.downloadURL | if citation.onlineResources[first occurence].uri [path does not end in ".html"] [*required if applicable*] |resourceDistribution.distributor.transferOption.onlineOption.uri |
| - MediaType | dcat:distribution.mediaType | [*add codelist of "dataFormat"*] <br> transferOption.distributionFormat.formatSpecification.title [dataFormat] [*dataFormat should conform to: https://www.iana.org/assignments/media-types/media-types.xhtml*] |
| - Title | dcat:distribution.title | exists | resourceDistribution.distributor.transferOption.onlineOption.name |
| License | dcat:license | [*add resourceInfo.constraint.reference to mdEditor*] <br> if exists resourceInfo.constraint.reference[0] | resourceInfo.constraint.reference[0] <br> |
| | | else | https://creativecommons.org/publicdomain/zero/1.0/ <br> [*allows author to identify a license to use, or default to CC0 if none provided, CC0 would cover international usage as opposed to publicdomain*] <br> [*others: http://www.usa.gov/publicdomain/label/1.0/, http://opendatacommons.org/licenses/pddl/1.0*] |
| Rights | dcat:rights | if constraint.accessLevel in {"restricted public", "non-public"} | resourceInfo.constraint.releasibility.statement + " " + each constraint.releasibility.dessiminationConstraint[0, n] |
| Endpoint | *removed* | *ignored* | *ignored* |
| Spatial | dcat:spatial | if exists resourceInfo.extents[0].geographicExtents[0].boundingBox | boundingBox.eastLongitude + "," + boundingBox.southLatitude + "," + boundingBox.westLongitude + "," + boundingBox.northLatitude [*decimal degrees*] |
| | | else | if exists resourceInfo.extents[0].geographicExtents[0].geographicElement[0].type = "point" then <br> geographicElement[0].coordinate[1] + "," + geographicElement[0].coordinate[0] [*lat, long decimal degrees*] |
| Temporal | dcat:temporal | if exists resourceInfo.extent[0].temporalExtent[0] then <br> if exists tempororalExtent[0].timePeriod.startDate and exists temporaralExtent[0].timePeriod.endDate | timePeriod[0].startDate + "/" + timePeriod.endDate |
| | | if exists tempororalExtent[0].timePeriod.startDate and not exists temporaralExtent[0].timePeriod.endDate | tempororalExtent[0].timePeriod.startDate |
| | | if not exists temporalExtent[0].timePeriod.startDate and exists temporaralExtent[0].timePeriod.endDate | tempororalExtent[0].timePeriod.endDate <br> [*may need revisiting relative to decision on date only formatting*] |

### No (not required)

| Field Name | DCAT Name | Condition | mdJson Source |
| --- | --- | --- | --- |
| Release Date | dcat:issued | if resourceInfo.citation.date[any].dateType = "publication" or "distributed" | resourceInfo.citation.date[earliest] |
| Frequency | dcat:accrualPeriodicity | | [*ISO codelist MD_maintenanceFrequency can be used and several codes intersect with accrualPeriod codelist they are partially corresponding. A column of ISO8601 code equivalents could be added to MD_maintenanceFrequency to provide the coding expected https://resources.data.gov/schemas/dcat-us/v1.1/iso8601_guidance/#accrualperiodicity, community valuation should be determined*] |
| Language | dcat:language | | [*language codelist could be used but needs to be bound with country corresponding to the RFC 5646 format https://datatracker.ietf.org/doc/html/rfc5646, such as "en-US", community valuation should be determined* |
| Data Quality | dcat:dataQuality | | [*this is a boolean to indicate whether data "conforms" to agency standards, value seems negligble*] |
| Category | dcat:theme | where resourceInfo.keyword[any].thesaurus.title = "ISO Topic Category" | [resourceInfo.keyword.keyword[0, n] *flatten*] |
| Related Documents | dcat:references | | associatedResource[all].resourceCitation.onlineResource[all].uri + additionalDocumentation[all].citation[all].onlineResource[all].uri [*comma separated*]|
| Homepage URL | dcat:landingPage | [*Add code "landingPage" to CI_OnlineFunctionCode*] <br> if resourceInfo.citation.onlineResource[any].function = "landingPage" | resourceInfo.citation.onlineResource.uri |
| Collection | dcat:isPartOf | for each associatedResource[0, n].initiativeType = "collection" and associatedResource.associationType = "collectiveTitle" | associatedResource.resourceCitation[0].uri |
| System of Records | dcat:systemOfRecords | [*Add code "sorn" to DS_InitiativeTypeCode*] <br> for each associatedResource[0, n].initiativeType = "sorn" | associatedResource.resourceCitation[0].uri |
| Primary IT Investment | dcat:primaryITInvestmentUII | | [*Links data to an IT investment identifier relative to Exhibit 53 docs, community valuation should be determined*] |
| Data Dictionary | dcat:describedBy | if dataDictionary.dictionaryIncludedWithResource IS NOT TRUE and citation[0].onlineResource[0].uri exists | dataDictionary.citation[0].onlineResource[0].uri |
| Data Dictionary Type | dcat:describedByType | | [*For simplicity, leave blank implying html page, community decision needed whether to support other format types using mime type and in the form of "application/pdf"*]|
| Data Standard | dcat:conformsTo | | [*Currently not able to identify the schema standard the data conforms to, though this has been proposed. Should this be built and there is community decision to support it, then it can be mapped*] |
1 change: 1 addition & 0 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Rake::TestTask.new do |t|
'test/writers/iso19115-2/tc*.rb',
'test/writers/mdJson/tc*.rb',
'test/writers/sbJson/tc*.rb',
'test/writers/dcat_us/tc*.rb',
'test/translator/tc*.rb'
]
t.verbose = true
Expand Down
98 changes: 98 additions & 0 deletions lib/adiwg/mdtranslator/writers/dcat_us/dcat_us_writer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
require 'jbuilder'
require_relative 'version'
require_relative 'sections/dcat_us_dcat_us'

module ADIWG
module Mdtranslator
module Writers
module Dcat_us

def self.startWriter(intObj, responseObj)
# set the contact array for use by the writer
@contacts = intObj[:contacts]

# set output flag for null properties
Jbuilder.ignore_nil(!responseObj[:writerShowTags])

# set the format of the output file based on the writer specified
responseObj[:writerOutputFormat] = 'json'
responseObj[:writerVersion] = ADIWG::Mdtranslator::Writers::Dcat_us::VERSION

# write the dcat_us metadata record
metadata = Dcat_us.build(intObj, responseObj)

# set writer pass to true if no messages
# false or warning state will be set by writer code
responseObj[:writerPass] = true if responseObj[:writerMessages].empty?

# encode the metadata target as JSON
metadata.target!
end

# find contact in contact array and return the contact hash
def self.get_contact_by_index(contactIndex)
if @contacts[contactIndex]
return @contacts[contactIndex]
end
{}
end

# find contact in contact array and return the contact hash
def self.get_contact_by_id(contactId)
@contacts.each do |hContact|
if hContact[:contactId] == contactId
return hContact
end
end
{}
end

# find contact in contact array and return the contact index
def self.get_contact_index_by_id(contactId)
@contacts.each_with_index do |hContact, index|
if hContact[:contactId] == contactId
return index
end
end
{}
end

# ignore jBuilder object mapping when array is empty
def self.json_map(collection = [], _class)
if collection.nil? || collection.empty?
return nil
else
collection.map { |item| _class.build(item).attributes! }
end
end

# find all nested objects in 'obj' that contain the element 'ele'
def self.nested_objs_by_element(obj, ele, excludeList = [])
aCollected = []
obj.each do |key, value|
skipThisOne = false
excludeList.each do |exclude|
if key == exclude.to_sym
skipThisOne = true
end
end
next if skipThisOne
if key == ele.to_sym
aCollected << obj
elsif obj.is_a?(Array)
if key.respond_to?(:each)
aReturn = nested_objs_by_element(key, ele, excludeList)
aCollected = aCollected.concat(aReturn) unless aReturn.empty?
end
elsif obj[key].respond_to?(:each)
aReturn = nested_objs_by_element(value, ele, excludeList)
aCollected = aCollected.concat(aReturn) unless aReturn.empty?
end
end
aCollected
end

end
end
end
end
10 changes: 10 additions & 0 deletions lib/adiwg/mdtranslator/writers/dcat_us/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

## dcat_us

### Supported versions

> 0.0.x (dcat_us is not currently versioned)

### Writer for Data Catalog Vocabulary (DCAT) v1.1


Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
require 'jbuilder'

module ADIWG
module Mdtranslator
module Writers
module Dcat_us
module AccessLevel

def self.build(intObj)
resourceInfo = intObj[:metadata][:resourceInfo]
legalConstraints = resourceInfo[:constraints]&.select { |constraint| constraint[:type] == 'legal' }

accessLevel = legalConstraints&.detect do |constraint|
codes = constraint.dig(:legalConstraint, :accessCodes)
codes&.any? { |code| ["public", "restricted public", "non-public"].include?(code) }
end&.dig(:legalConstraint, :accessCodes)&.find { |code| ["public", "restricted public", "non-public"].include?(code) }

accessLevel ? accessLevel : nil
end

end
end
end
end
end

Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
require 'jbuilder'

module ADIWG
module Mdtranslator
module Writers
module Dcat_us
module ContactPoint

def self.build(intObj)
resourceInfo = intObj[:metadata][:resourceInfo]
pointOfContact = resourceInfo[:pointOfContacts][0]
contactId = pointOfContact[:parties][0][:contactId]

contact = Dcat_us.get_contact_by_id(contactId)
fn = contact[:name]
hasEmail = contact[:eMailList][0]

Jbuilder.new do |json|
json.set!('@type', 'vcard:Contact')
json.set!('fn', fn)
json.set!('hasEmail', hasEmail)
end

end
end
end
end
end
end
77 changes: 77 additions & 0 deletions lib/adiwg/mdtranslator/writers/dcat_us/sections/dcat_us_dcat_us.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
require 'jbuilder'
require_relative 'dcat_us_keyword'
require_relative 'dcat_us_publisher'
require_relative 'dcat_us_contact_point'
require_relative 'dcat_us_identifier'
require_relative 'dcat_us_distribution'
require_relative 'dcat_us_spatial'
require_relative 'dcat_us_temporal'
require_relative 'dcat_us_modified'
require_relative 'dcat_us_access_level'
require_relative 'dcat_us_rights'
require_relative 'dcat_us_license'

module ADIWG
module Mdtranslator
module Writers
module Dcat_us

def self.build(intObj, responseObj)
metadataInfo = intObj[:metadata][:metadataInfo]
resourceInfo = intObj[:metadata][:resourceInfo]
citation = resourceInfo[:citation]

title = citation[:title]
description = citation[:abstract]
keyword = Keyword.build(intObj)
modified = Modified.build(intObj)
publisher = Publisher.build(intObj)
contactPoint = ContactPoint.build(intObj)
accessLevel = AccessLevel.build(intObj)
identifier = Identifier.build(intObj)
distribution = Distribution.build(intObj)
rights = Rights.build(intObj, accessLevel)
spatial = Spatial.build(intObj)
temporal = Temporal.build(intObj)
license = License.build(intObj)

@Namespace = ADIWG::Mdtranslator::Writers::Dcat_us

Jbuilder.new do |json|
json.set!('@type', 'dcat:Dataset')
json.set!('dcat:title', title)
json.set!('dcat:description', description)
json.set!('dcat:keyword', keyword)
json.set!('dcat:modified', modified)
json.set!('dcat:publisher', publisher)
json.set!('dcat:contactPoint', contactPoint)
json.set!('dcat:identifier', identifier)
json.set!('dcat:accessLevel', accessLevel)
# json.set!('dcat:bureauCode', 'ToDo')
# json.set!('dcat:programCode', 'ToDo')
json.set!('dcat:distribution', distribution)

json.set!('dcat:license', license)
json.set!('dcat:rights', rights)
json.set!('dcat:spatial', spatial)
json.set!('dcat:temporal', temporal)

# json.set!('dcat:issued', metadataInfo[:metadataDates][0][:date])
# json.set!('dcat:accrualPeriodicity', metadataInfo[:metadataMaintenance][:maintenanceFrequency])
# json.set!('dcat:language', metadataInfo[:metadataLocales][0][:languageCode])
# json.set!('dcat:dataQuality', metadataInfo[:metadataMaintenance][:maintenanceNote])
# json.set!('dcat:theme', metadataInfo[:metadataTopics][0][:topicCategory])
# json.set!('dcat:references', metadataInfo[:metadataCitation])
# json.set!('dcat:landingPage', metadataInfo[:metadataOnlineOptions][0][:olResURI])
# json.set!('dcat:isPartOf', metadataInfo[:metadataHierarchy][0][:parentMetadata][:metadataId])
# json.set!('dcat:systemOfRecords', metadataInfo[:metadataHierarchy][0][:parentMetadata][:metadataId])
# json.set!('dcat:primaryITInvestmentUII', metadataInfo[:metadataId])
# json.set!('dcat:describedBy', metadataInfo[:metadataOnlineOptions][0][:olResURI])
# json.set!('dcat:describedByType', metadataInfo[:metadataOnlineOptions][0][:olResProtocol])
# json.set!('dcat:conformsTo', metadataInfo[:metadataStandards][0][:standardName])
end
end
end
end
end
end
Loading