Skip to content

Commit

Permalink
Merge branch 'develop' into feature/acquisition
Browse files Browse the repository at this point in the history
  • Loading branch information
jwaspin committed Dec 10, 2024
2 parents 391a184 + e213959 commit c54e04a
Show file tree
Hide file tree
Showing 95 changed files with 33,021 additions and 29 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ruby/setup-ruby@ec02537da5712d66d4d50a0f33b7eb52773b5ed1
- uses: ruby/setup-ruby@v1
with:
ruby-version: "3.1" # Not needed with a .ruby-version file
- run: bundle install
Expand Down
67 changes: 67 additions & 0 deletions DCAT-US.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# DCAT-US - mdTranslator proposed mappings
## Quick references
- DCAT-US [element definitions](https://resources.data.gov/resources/dcat-us/)
- DCAT-US v1.1 [catalog.json schema](https://resources.data.gov/schemas/dcat-us/v1.1/schema/catalog.json)
- DCAT-US v1.1 [dataset.json schema](https://resources.data.gov/schemas/dcat-us/v1.1/schema/dataset.json)
- DCAT-US v1.1 [JSON-LD catalog.json schema](https://resources.data.gov/schemas/dcat-us/v1.1/schema/catalog.jsonld)
- [Element crosswalks](https://resources.data.gov/resources/podm-field-mapping/#field-mappings) to other standards

## DCAT-US - mdTranslator

### Always (always required)

| Field Name | DCAT Name | Condition | mdJson Source |
| --- | --- | --- | --- |
| Title | title | exists | citation.title |
| Description | description | exists | resourceInfo.abstract |
| Tags | keyword | exists | [resourceInfo.keyword.keyword[0, n] *flatten*] |
| Last Update | modified | if resourceInfo.citation.date[any].dateType = "lastUpdated" or "lastRevised" or "revision" | resourceInfo.citation.date[most recent] |
| Publisher | publisher{name} | if citation.responsibleParty.[any].role = "publisher" | contactId -> contact.name where isOrganization IS TRUE |
| | | if exists resourceDistribution.distributor.contact | [first contact] contactId -> contact.name where isOrganization IS TRUE |
| Publisher Parent Organization | publisher{subOrganizationOf} | if citation.responsibleParty[any].role = "publisher" and exists contactId -> memberOfOrganization[0] and isOrganization is true | contactId -> contact.name |
| | | if exists resourceDistribution.distributor.contact and exists contactId -> memberOfOrganization[0] and isOrganization IS TRUE | contactId -> contact.name |
| Contact Name | contactPoint{fn} | exists | resourceInfo.pointOfContact.parties[0].contactId -> contact.name |
| Contact Email | contactPoint{email} | exists | resourceInfo.pointOfContact.parties[0].contactId -> contact.eMailList[0] |
| Unique Identifier | identifier | if resourceInfo.citation.identifier.namespace = "DOI" | resourceInfo.citation.onlineResource.uri |
| | | if "DOI" within resourceInfo.citation.onlineResource.uri | resourceInfo.citation.onlineResource.uri |
| Public Access Level | accessLevel | [*extend codelist MD_RestrictionCode to include "public", "restricted public", "non-public"*] <br> if resourceInfo.constraints.legal[any] one of {"public", "restricted public", "non-public"} | resourceInfo.constraints.legal[first]. Also resourceInfo.constraint.security.classification [[MD_ClassificationCode](https://mdtools.adiwg.org/#codes-page?c=iso_classification)] |
| Bureau Code | bureauCode | | [*extend role codelist to include "bureau", extend namespace codelist to include "bureauCode"*] <br> for each resourceInfo.citation.responsibleParty[any] role = "bureau" <br>contactId -> contact.identifier [*identifier must conform to https://resources.data.gov/schemas/dcat-us/v1.1/omb_bureau_codes.csv*] |
| Program Code | programCode | | [*add new element of program resourceInfo.programCode, add new codelist of programCode*] <br> resourceInfo.program[0,n] |

### If-Applicable (required if it exists)

| Field Name | DCAT Name | Condition | mdJson Source |
| --- | --- | --- | --- |
| Distribution | distribution | if exists resourceDistribution[any] and if exists resourceDistribution.distributor[any].transferOption[any].onlineOption[any].uri <br> for each resourceDistribution[0, n] where exists resourceDistribution.distributor.transferOption.onlineOption.uri then <br> {description, accessURL, downloadURL, mediaType, title} |
| - Description | distribution.description | exists | resourceDistribution.description |
| - AccessURL | distribution.accessURL | if citation.onlineResources[first occurence].uri [path ends in ".html"] [*required if applicable*] | resourceDistribution.distributor.transferOption.onlineOption.uri |
| - DownloadURL | dcat.distribution.downloadURL | if citation.onlineResources[first occurence].uri [path does not end in ".html"] [*required if applicable*] |resourceDistribution.distributor.transferOption.onlineOption.uri |
| - MediaType | distribution.mediaType | [*add codelist of "dataFormat"*] <br> transferOption.distributionFormat.formatSpecification.title [dataFormat] [*dataFormat should conform to: https://www.iana.org/assignments/media-types/media-types.xhtml*] |
| - Title | distribution.title | exists | resourceDistribution.distributor.transferOption.onlineOption.name |
| License | license | [*add resourceInfo.constraint.reference to mdEditor*] <br> if exists resourceInfo.constraint.reference[0] | resourceInfo.constraint.reference[0] <br> |
| | | else | https://creativecommons.org/publicdomain/zero/1.0/ <br> [*allows author to identify a license to use, or default to CC0 if none provided, CC0 would cover international usage as opposed to publicdomain*] <br> [*others: http://www.usa.gov/publicdomain/label/1.0/, http://opendatacommons.org/licenses/pddl/1.0*] |
| Rights | rights | if constraint.accessLevel in {"restricted public", "non-public"} | resourceInfo.constraint.releasibility.statement + " " + each constraint.releasibility.dessiminationConstraint[0, n] |
| Endpoint | *removed* | *ignored* | *ignored* |
| Spatial | spatial | if exists resourceInfo.extents[0].geographicExtents[0].boundingBox | boundingBox.eastLongitude + "," + boundingBox.southLatitude + "," + boundingBox.westLongitude + "," + boundingBox.northLatitude [*decimal degrees*] |
| | | else | if exists resourceInfo.extents[0].geographicExtents[0].geographicElement[0].type = "point" then <br> geographicElement[0].coordinate[1] + "," + geographicElement[0].coordinate[0] [*lat, long decimal degrees*] |
| Temporal | temporal | if exists resourceInfo.extent[0].temporalExtent[0] then <br> if exists tempororalExtent[0].timePeriod.startDate and exists temporaralExtent[0].timePeriod.endDate | timePeriod[0].startDate + "/" + timePeriod.endDate |
| | | if exists tempororalExtent[0].timePeriod.startDate and not exists temporaralExtent[0].timePeriod.endDate | tempororalExtent[0].timePeriod.startDate |
| | | if not exists temporalExtent[0].timePeriod.startDate and exists temporaralExtent[0].timePeriod.endDate | tempororalExtent[0].timePeriod.endDate <br> [*may need revisiting relative to decision on date only formatting*] |

### No (not required)

| Field Name | DCAT Name | Condition | mdJson Source |
| --- | --- | --- | --- |
| Release Date | issued | if resourceInfo.citation.date[any].dateType = "publication" or "distributed" | resourceInfo.citation.date[earliest] |
| Frequency | accrualPeriodicity | | [*ISO codelist MD_maintenanceFrequency can be used and several codes intersect with accrualPeriod codelist they are partially corresponding. A column of ISO8601 code equivalents could be added to MD_maintenanceFrequency to provide the coding expected https://resources.data.gov/schemas/dcat-us/v1.1/iso8601_guidance/#accrualperiodicity, community valuation should be determined*] |
| Language | language | | [*language codelist could be used but needs to be bound with country corresponding to the RFC 5646 format https://datatracker.ietf.org/doc/html/rfc5646, such as "en-US", community valuation should be determined* |
| Data Quality | dataQuality | | [*this is a boolean to indicate whether data "conforms" to agency standards, value seems negligble*] |
| Category | theme | where resourceInfo.keyword[any].thesaurus.title = "ISO Topic Category" | [resourceInfo.keyword.keyword[0, n] *flatten*] |
| Related Documents | references | | associatedResource[all].resourceCitation.onlineResource[all].uri + additionalDocumentation[all].citation[all].onlineResource[all].uri [*comma separated*]|
| Homepage URL | landingPage | [*Add code "landingPage" to CI_OnlineFunctionCode*] <br> if resourceInfo.citation.onlineResource[any].function = "landingPage" | resourceInfo.citation.onlineResource.uri |
| Collection | isPartOf | for each associatedResource[0, n].initiativeType = "collection" and associatedResource.associationType = "collectiveTitle" | associatedResource.resourceCitation[0].uri |
| System of Records | systemOfRecords | [*Add code "sorn" to DS_InitiativeTypeCode*] <br> for each associatedResource[0, n].initiativeType = "sorn" | associatedResource.resourceCitation[0].uri |
| Primary IT Investment | primaryITInvestmentUII | | [*Links data to an IT investment identifier relative to Exhibit 53 docs, community valuation should be determined*] |
| Data Dictionary | describedBy | if dataDictionary.dictionaryIncludedWithResource IS NOT TRUE and citation.onlineResource[0].uri exists | dataDictionary.citation.onlineResource[0].uri |
| Data Dictionary Type | describedByType | | [*For simplicity, leave blank implying html page, community decision needed whether to support other format types using mime type and in the form of "application/pdf"*]|
| Data Standard | conformsTo | | [*Currently not able to identify the schema standard the data conforms to, though this has been proposed. Should this be built and there is community decision to support it, then it can be mapped*] |
39 changes: 39 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,20 @@ PATH
GEM
remote: https://rubygems.org/
specs:
actionview (7.1.3.3)
activesupport (= 7.1.3.3)
actionview (7.1.3.3)
activesupport (= 7.1.3.3)
builder (~> 3.1)
erubi (~> 1.11)
rails-dom-testing (~> 2.2)
rails-html-sanitizer (~> 1.6)
activesupport (7.1.3.3)
base64
bigdecimal
erubi (~> 1.11)
rails-dom-testing (~> 2.2)
rails-html-sanitizer (~> 1.6)
activesupport (7.1.3.3)
base64
bigdecimal
Expand All @@ -33,6 +41,13 @@ GEM
minitest (>= 5.1)
mutex_m
tzinfo (~> 2.0)
addressable (2.8.6)
connection_pool (>= 2.2.5)
drb
i18n (>= 1.6, < 2)
minitest (>= 5.1)
mutex_m
tzinfo (~> 2.0)
addressable (2.8.6)
public_suffix (>= 2.0.2, < 6.0)
adiwg-mdcodes (2.10.0)
Expand All @@ -44,22 +59,30 @@ GEM
coderay (1.1.3)
concurrent-ruby (1.2.3)
connection_pool (2.4.1)
concurrent-ruby (1.2.3)
connection_pool (2.4.1)
crass (1.0.6)
drb (2.2.1)
drb (2.2.1)
erubi (1.12.0)
i18n (1.14.5)
i18n (1.14.5)
concurrent-ruby (~> 1.0)
jbuilder (2.12.0)
jbuilder (2.12.0)
actionview (>= 5.0.0)
activesupport (>= 5.0.0)
json (2.7.2)
json (2.7.2)
json-schema (2.8.1)
addressable (>= 2.4)
kramdown (2.4.0)
rexml
loofah (2.22.0)
loofah (2.22.0)
crass (~> 1.0.2)
nokogiri (>= 1.12.0)
nokogiri (>= 1.12.0)
minitest (5.20.0)
mutex_m (0.2.0)
nokogiri (1.15.6-arm64-darwin)
Expand All @@ -68,18 +91,34 @@ GEM
racc (~> 1.4)
public_suffix (5.0.5)
racc (1.8.0)
rails-dom-testing (2.2.0)
mutex_m (0.2.0)
nokogiri (1.15.6-arm64-darwin)
racc (~> 1.4)
nokogiri (1.15.6-x86_64-linux)
racc (~> 1.4)
public_suffix (5.0.5)
racc (1.8.0)
rails-dom-testing (2.2.0)
activesupport (>= 5.0.0)
minitest
nokogiri (>= 1.6)
rails-html-sanitizer (1.6.0)
loofah (~> 2.21)
nokogiri (~> 1.14)
rails-html-sanitizer (1.6.0)
loofah (~> 2.21)
nokogiri (~> 1.14)
rake (13.1.0)
rexml (3.2.8)
strscan (>= 3.0.9)
strscan (3.1.0)
rexml (3.2.8)
strscan (>= 3.0.9)
strscan (3.1.0)
thor (0.20.3)
tzinfo (2.0.6)
concurrent-ruby (~> 1.0)
tzinfo (2.0.6)
concurrent-ruby (~> 1.0)
uuidtools (2.2.0)
Expand Down
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,27 @@ Or install it yourself as:

$ mdtranslator help translate

## Development

### Requirements

Requires
- [Ruby](https://www.ruby-lang.org/en/documentation/installation/)
- bundler (`gem install bundler`)
- rake (`gem install rake`)

### Tests

In order to run the tests, first install the dependencies

$ bundle install

Then, run the rake command

$ bundle exec rake

_TODO: There are currently 4 tests that are not passing, related to mdJSON readers and writers_

## Contributing

1. Fork it ( https://github.com/[my-github-username]/mdTranslator/fork )
Expand Down
1 change: 1 addition & 0 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Rake::TestTask.new do |t|
'test/writers/iso19115-3/tc*.rb',
'test/writers/mdJson/tc*.rb',
'test/writers/sbJson/tc*.rb',
'test/writers/dcat_us/tc*.rb',
'test/translator/tc*.rb'
]
t.verbose = true
Expand Down
1 change: 1 addition & 0 deletions adiwg-mdtranslator.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -40,5 +40,6 @@ Gem::Specification.new do |spec|
spec.add_runtime_dependency "kramdown", ">= 1.13", "< 3.0"
spec.add_runtime_dependency "coderay", "~> 1.1"
spec.add_runtime_dependency "nokogiri", "~> 1.15"
spec.add_runtime_dependency "nokogiri", "~> 1.15"

end
98 changes: 98 additions & 0 deletions lib/adiwg/mdtranslator/writers/dcat_us/dcat_us_writer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
require 'jbuilder'
require_relative 'version'
require_relative 'sections/dcat_us_dcat_us'

module ADIWG
module Mdtranslator
module Writers
module Dcat_us

def self.startWriter(intObj, responseObj)
# set the contact array for use by the writer
@contacts = intObj[:contacts]

# set output flag for null properties
Jbuilder.ignore_nil(!responseObj[:writerShowTags])

# set the format of the output file based on the writer specified
responseObj[:writerOutputFormat] = 'json'
responseObj[:writerVersion] = ADIWG::Mdtranslator::Writers::Dcat_us::VERSION

# write the dcat_us metadata record
metadata = Dcat_us.build(intObj, responseObj)

# set writer pass to true if no messages
# false or warning state will be set by writer code
responseObj[:writerPass] = true if responseObj[:writerMessages].empty?

# encode the metadata target as JSON
metadata.target!
end

# find contact in contact array and return the contact hash
def self.get_contact_by_index(contactIndex)
if @contacts[contactIndex]
return @contacts[contactIndex]
end
{}
end

# find contact in contact array and return the contact hash
def self.get_contact_by_id(contactId)
@contacts.each do |hContact|
if hContact[:contactId] == contactId
return hContact
end
end
{}
end

# find contact in contact array and return the contact index
def self.get_contact_index_by_id(contactId)
@contacts.each_with_index do |hContact, index|
if hContact[:contactId] == contactId
return index
end
end
{}
end

# ignore jBuilder object mapping when array is empty
def self.json_map(collection = [], _class)
if collection.nil? || collection.empty?
return nil
else
collection.map { |item| _class.build(item).attributes! }
end
end

# find all nested objects in 'obj' that contain the element 'ele'
def self.nested_objs_by_element(obj, ele, excludeList = [])
aCollected = []
obj.each do |key, value|
skipThisOne = false
excludeList.each do |exclude|
if key == exclude.to_sym
skipThisOne = true
end
end
next if skipThisOne
if key == ele.to_sym
aCollected << obj
elsif obj.is_a?(Array)
if key.respond_to?(:each)
aReturn = nested_objs_by_element(key, ele, excludeList)
aCollected = aCollected.concat(aReturn) unless aReturn.empty?
end
elsif obj[key].respond_to?(:each)
aReturn = nested_objs_by_element(value, ele, excludeList)
aCollected = aCollected.concat(aReturn) unless aReturn.empty?
end
end
aCollected
end

end
end
end
end
10 changes: 10 additions & 0 deletions lib/adiwg/mdtranslator/writers/dcat_us/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

## dcat_us

### Supported versions

> 0.0.x (dcat_us is not currently versioned)
### Writer for Data Catalog Vocabulary (DCAT) v1.1


Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
require 'jbuilder'

module ADIWG
module Mdtranslator
module Writers
module Dcat_us
module AccessLevel

def self.build(intObj)

publicArray = ['unclassified', 'unrestricted', 'licenseUnrestricted', 'licenseEndUser']
nonPublicArray = ['restricted','confidential','secret','topSecret','forOfficialUseOnly','protected','intellectualPropertyRights','restricted','otherRestrictions','private','statutory','confidential','traditionalKnowledge','personallyIdentifiableInformation']
restrictedPublicArray = ['sensitiveButUnclassified','limitedDistribution','copyright','patent','patentPending','trademark','license','licenseDistributor','in-confidence','threatenedOrEndangered']

resourceInfo = intObj[:metadata][:resourceInfo]
legalConstraints = resourceInfo[:constraints]&.select { |constraint| constraint[:type] == 'legal' }
securityConstraints = resourceInfo[:constraints]&.select { |constraint| constraint[:type] == 'security' }

accessLevelCodes = []

# Gather codes from security constraints and legal constraints
unless securityConstraints.empty?
securityConstraints.each do |securityConstraint|
code = securityConstraint[:securityConstraint][:classCode]
accessLevelCodes << code
end
end
unless legalConstraints.empty?
legalConstraints.each do |legalConstraint|
codes = legalConstraint.dig(:legalConstraint, :accessCodes)
accessLevelCodes.push(*codes)
end
end

# return access level that is most restrictive
accessLevelCodes.uniq.each do |code|
if nonPublicArray.include? code
return 'non-public'
end
end
accessLevelCodes.uniq.each do |code|
if restrictedPublicArray.include? code
return 'restricted public'
end
end

return 'public'
end

end
end
end
end
end

Loading

0 comments on commit c54e04a

Please sign in to comment.