Can data packages be made easily findable using search engines? #871
Replies: 7 comments
-
@HeidiSeibold great to have you flag this and there has been a fair amount of discussion. The basic method to make data packages better discoverable will be to add the relevant meta tags or other info into the html page where the data package is catalogged. This is something already supported in e.g. https://datahub.io and also is automatically supported in most CKAN based data portals. In terms of the data package specs I think we could definitely publish a pattern that suggests a standard mapping for data package metadata to the tags you can add to your html page. |
Beta Was this translation helpful? Give feedback.
-
For Google, specifically the recently announced Google Dataset Search, the relevant documentation is https://developers.google.com/search/docs/data-types/dataset which documents the specific ways in which Google understands schema.org and W3C DCAT for dataset discovery (including use of sitemap files, canonical URLs for de-duplication, etc.).
In brief, we can extract simple dataset descriptions that use http://schema.org/Dataset or a similar structure using DCAT, from dataset-describing pages that use any of 1) JSON-LD in a script tag, 2) RDFa 1.1, 3.) Microdata syntaxes. For dataset descriptions that use these notations, like W3C Data Cube, W3C CSVW, we're looking into direct support. For other formats and approaches it may be useful to collaborate on mappings. Looking at https://datahub.io/JohnSnowLabs/uk-greater-london-public-expenditures I don't see the markup directly appearing. What codebase runs datahub.io, is it different to CKAN? My understanding is that general CKAN now has support for the schema.org markup and/or DCAT, either directly in latest version or via the DCAT extension. Nearby: |
Beta Was this translation helpful? Give feedback.
-
/cc @serahrono who I just met at Wikicite conference :) |
Beta Was this translation helpful? Give feedback.
-
/cc @amercader @metaodi https://ckan.org/2018/04/30/make-open-data-discoverable-for-search-engines/ |
Beta Was this translation helpful? Give feedback.
-
As @danbri already mentioned, if you use CKAN and the latest ckanext-dcat extension, you're set up to feed Google (and whoever else supports schema.org/Dataset) and appear in search results (iirc currently limited to the above linked "Dataset Search"). @rufuspollock does the support in datahub.io imply, that such a mapping already exists? Anyway: since CKAN can handle both data package and schema.org, it should be fairly easy to extract. |
Beta Was this translation helpful? Give feedback.
-
Give me shout (maybe a twitter ping, am 'danbri' there) if I can help on this, in case I miss the github msgs in the noise... |
Beta Was this translation helpful? Give feedback.
-
@danbri @metaodi DataHub.io does not run CKAN but we are a set of the attributes that @danbri mentions so that these datasets automatically show up in Google Dataset search 😄 It would also be useful to produce a published "pattern" on https://frictionlessdata.io/specs/patterns/ that maps Data Package metadata to the structure needed for Google |
Beta Was this translation helpful? Give feedback.
-
It seems to me that with schema.org and this there is currently a movement toward making data sets more easily findable with search engines.
How do these efforts relate to data packages?
Side note: I have no technical knowledge about search engines nor do I really understand what schema.org does. I am just a researcher who wants to make her data sets findable 👩🔬
Beta Was this translation helpful? Give feedback.
All reactions