-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Importing and Exporting DCAT Metadata #1592
Comments
Here is a tool we can look at when working on support for DCAT/RDF http://rdforms.com/editors/dcat/ |
I just added the "harvesting" label to this issue but http://www.data.gov/developers/harvesting doesn't mention OAI-PHM (which is now supported as of Dataverse 4.5) so maybe this is a different type of harvesting. |
ICSPR just announced that it's retiring OAI-PMH harvesting for their repository, and is "exploring an API-focused solution that will involve delivering metadata using the DCAT-US schema." Has Dataverse considered DCAT-US for metadata harvesting? Please say "yes." |
@tlchristian not that I'm aware of. Would you be able to create a fresh issue so we can close this one? |
I just had a look at ICPSR, and it seems they still support OAI-PMH: https://www.icpsr.umich.edu/web/ICPSR/cms/3965. |
Should this issue title be updated with a more recent version of DCAT than 1.1 as version 3 is already in review https://www.w3.org/TR/vocab-dcat-3/ ? |
@DS-INRA I removed "v1.1" from the title. I hope that helps! |
2024/08/19: @sbarbosadataverse will post announcement to Community to ask if this support is still needed. Update. Done: https://groups.google.com/g/dataverse-community/c/KP9DPlOk3Po/m/wvAB1hIHBAAJ |
Hi all, DataverseNO is interested in Dataverse support for DCAT. According to Wikipedia, "DCAT is the foundation for open dataset descriptions in the European Union public sector and was adapted by the ISA programme of the European Commission". It seems many data portals, especially in Europe, are based on DCAT. For example, Norwegian public data are collected in a data portal based on a Norwegian DCAT profile, DCAT-AP-NO, which is based on the European Commission DCAT profile. In Norway (and I guess in other countries as well), research data produced by (public) universities should be made findable in public data portals. That's why DCAT support in Dataverse is important to us. |
Hi All, Thanks to Philipp for pointing me to this issue. DCAT is a big thing in The Netherlands. Just as in Norway, a derived version from DCAT-AP-EU is becoming the national standaard for governmental (meta)data. A public consultation on this new version of DCAT (to be precise the Dutch profile of the European DCAT-AP-3.0 standaard) was finished in May. For the Dataverse based DANS Data Stations, especially DCAT related to Health data and DCAT related to Geospatial data, are top-priority, in order to create the connection to services like the European Health Data Space (EHDS), and our national Health data and Geo data catalogues. So count us in on these Dataverse DCAT developments.... we have to do this anyway, and are even gathering resources at the moment to start the developments. Cees |
Good to hear, @CeesH! I just earlier today was informed that there is a public consultation on the new version of DCAT running also in Norway. Also, earlier this year, an Official Norwegian Report (NOU) was issued on sharing and reuse of public data. They suggest the introduction of a new national law on data sharing. Among other things, the proposal suggests a) using DCAT as the metadata standard for public data (cf. section 11.5.2); and b) that publicly funded research data published in (institutional) repositories need to comply with the proposed data sharing law (cf. section 5.2). For DataverseNO, this implies that we at some point should be able to support the description of dataset based on DCAT. |
@philippconzett yes.... there is no escape. In NL, most DCAT developments seem to come from/start at the geospatial community. That is why we start our investigations with the GeoDCAT developments. In the Health sciences, this is also a development not to ignore: https://doi.org/10.1093/eurpub/ckad160.037 |
Two important initiatives in Europe, that is the European Health Data Space (EHDS) and the so called Health Data Access Bodies (HDABs), are based on importing DCAT metadata. In the Netherlands, our national Health Data Catalogue will also be established around DCAT. In other words, quite urgent to be able to export metadata from a Dataverse following DCAT. Would this be something for EU Dataverse users to collaborate on? Something to discuss at the next Dataverse Community Meeting 2025 maybe? |
I didn't know EHDS would use DCAT as its standard for imports. Good to know! And we'd definitely like to explore how we can work on this as well with the growing body of health data in our Dataverse. |
@CeesH : we (Geological Survey of the Netherlands) are currently investigating whether Dataverse is a viable solution for publishing datasets (internally and externally). Soon are going into a proof of concept phase to get a better feeling on what Dataverse offers. One of the major requirements is indeed the Dutch profile that you mentioned above. Depending on this proof-of-concept I would be interested in what it would take to get support for DCAT in Dataverse. I'm not an expert on metadata. It seems there is relation between different classes in the DCAT profile. The Alternatively: if it is "only" the |
@sjaakd hi! As you investigate Dataverse, please let us know if you have any questions! Here's a screenshot from that DCAT profile link you shared: I know almost nothing about DCAT but at https://www.w3.org/TR/vocab-dcat-3/#dcat-scope it looks like dcat:Catalog is defined like this: "dcat:Catalog represents a catalog, which is a dataset in which each individual item is a metadata record describing some resource; the scope of dcat:Catalog is collections of metadata about datasets, data services, or other resource types." In Dataverse, a collection of metadata about datasets sounds like a collection to me. That is, in Dataverse, a dataset lives inside a collection along side other datasets. Dataverse's OAI_ORE dataset metadata exporter does include information about the parent collection. For example, the dataset at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/UXIBNO lives inside the Dutch Parliamentary Behaviour Dataset Dataverse collection which can be seen under "isPartOf": A good place to start might be to create an external exporter for DCAT. This can be done outside the Dataverse code base as a small, standalone plugin as described at https://guides.dataverse.org/en/6.5/installation/advanced.html#external-metadata-exporters Here are two examples of external exporters: (The OAI_ORE exporter is internal and the code can be found here.) Sure, additional fields could be added to metadata blocks but it might be interesting to create an exporter with the existing fields to see which fields are not yet available. I'm not sure what to say about geonetwork. It looks interesting but I've never heard of it. |
@pdurbin : thanks for your comments. This quarter (Q1) my colleagues have planned to install Dataverse so we can play a bit more with the product and see how it integrates in our environment. I'll get back to this issue as soon as we've booked some progress. It would be nice to see if the other classes can be mapped as well. I can also imagine that perhaps they have fixed / constant values in a Dataverse scenario ( DCAT seems geared towards OGC or REST data services). If we build a DCAT exporter, I guess we could contribute this to this project. Would that be an interesting proposition? |
@sjaakd yes, absolutely! Say the word and we'll create an empty repo under https://github.com/gdcc for you to push to. We can even automate the publishing of the artifact (a Java jar file) to Maven Central for you. And we'd add it to https://github.com/gdcc/dataverse-exporters#list-of-known-exporters so others can find it. |
Hi Phil, hi Sjaak, Since my message(s) on this topic, we actually went on at DANS to probe the different ways to have Dataverse interact with DCAT based platforms and harvesters, and I'll ask our technical team through (andrecastro0o) to provide an update. Sorry to bother Dataversers with inland Dutch matters..... (And @sjaakd for your BRO/Subsurface services... our Dataverse based archeological datasets might be of interest, but that would be a project on its own.) Cees |
To support importing data from data.gov and exporting in the same format for data.gov to ingest our metadata.
Need to research this:
Here is the metadata schema used in the US Government, which is based on DCAT: https://project-open-data.cio.gov/v1.1/schema/
As you can see, there are a lot of optional fields, but only a few required ones (title, description, keywords, contact, URL). We recently moved to the 1.1. of the schema.
Also note that Data.gov ingest local, state, and university data
http://catalog.data.gov/dataset?organization_type=Federal+Government
http://catalog.data.gov/dataset?organization_type=University
Regardless of what we do, we should make sure Data.gov harvests/ingests us.
Some additional links:
https://project-open-data.cio.gov/
https://github.com/project-open-data/project-open-data.github.io
And here is how Data.gov does the harvesting:
http://www.data.gov/developers/harvesting
The text was updated successfully, but these errors were encountered: