Can data packages be made easily findable using search engines? #871

HeidiSeibold · 2018-11-02T08:38:57Z

HeidiSeibold
Nov 2, 2018

It seems to me that with schema.org and this there is currently a movement toward making data sets more easily findable with search engines.

How do these efforts relate to data packages?

Side note: I have no technical knowledge about search engines nor do I really understand what schema.org does. I am just a researcher who wants to make her data sets findable 👩‍🔬

rufuspollock · 2018-11-05T22:17:28Z

rufuspollock
Nov 5, 2018
Maintainer

@HeidiSeibold great to have you flag this and there has been a fair amount of discussion. The basic method to make data packages better discoverable will be to add the relevant meta tags or other info into the html page where the data package is catalogged. This is something already supported in e.g. https://datahub.io and also is automatically supported in most CKAN based data portals.

In terms of the data package specs I think we could definitely publish a pattern that suggests a standard mapping for data package metadata to the tags you can add to your html page.

0 replies

danbri · 2018-11-27T18:49:10Z

danbri
Nov 27, 2018

For Google, specifically the recently announced Google Dataset Search, the relevant documentation is https://developers.google.com/search/docs/data-types/dataset which documents the specific ways in which Google understands schema.org and W3C DCAT for dataset discovery (including use of sitemap files, canonical URLs for de-duplication, etc.).

Google Dataset Search: https://toolbox.google.com/datasetsearch
Announcement: https://www.blog.google/products/search/making-it-easier-discover-datasets/
Earlier blog post: https://ai.googleblog.com/2017/01/facilitating-discovery-of-public.html
Google structured data testing tool: https://developers.google.com/structured-data/testing-tool/

In brief, we can extract simple dataset descriptions that use http://schema.org/Dataset or a similar structure using DCAT, from dataset-describing pages that use any of 1) JSON-LD in a script tag, 2) RDFa 1.1, 3.) Microdata syntaxes. For dataset descriptions that use these notations, like W3C Data Cube, W3C CSVW, we're looking into direct support. For other formats and approaches it may be useful to collaborate on mappings.

Looking at https://datahub.io/JohnSnowLabs/uk-greater-london-public-expenditures I don't see the markup directly appearing. What codebase runs datahub.io, is it different to CKAN? My understanding is that general CKAN now has support for the schema.org markup and/or DCAT, either directly in latest version or via the DCAT extension.

Nearby:

0 replies

danbri · 2018-11-27T18:50:08Z

danbri
Nov 27, 2018

/cc @serahrono who I just met at Wikicite conference :)

0 replies

danbri · 2018-11-27T19:54:44Z

danbri
Nov 27, 2018

/cc @amercader @metaodi

https://ckan.org/2018/04/30/make-open-data-discoverable-for-search-engines/

0 replies

metaodi · 2018-11-27T20:05:45Z

metaodi
Nov 27, 2018

@HeidiSeibold 👋

As @danbri already mentioned, if you use CKAN and the latest ckanext-dcat extension, you're set up to feed Google (and whoever else supports schema.org/Dataset) and appear in search results (iirc currently limited to the above linked "Dataset Search").

@rufuspollock does the support in datahub.io imply, that such a mapping already exists?

Anyway: since CKAN can handle both data package and schema.org, it should be fairly easy to extract.

0 replies

danbri · 2018-12-07T19:34:40Z

danbri
Dec 7, 2018

Give me shout (maybe a twitter ping, am 'danbri' there) if I can help on this, in case I miss the github msgs in the noise...

0 replies

rufuspollock · 2019-01-06T14:33:54Z

rufuspollock
Jan 6, 2019
Maintainer

@danbri @metaodi DataHub.io does not run CKAN but we are a set of the attributes that @danbri mentions so that these datasets automatically show up in Google Dataset search 😄

It would also be useful to produce a published "pattern" on https://frictionlessdata.io/specs/patterns/ that maps Data Package metadata to the structure needed for Google

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can data packages be made easily findable using search engines? #871

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Can data packages be made easily findable using search engines? #871

HeidiSeibold Nov 2, 2018

Replies: 7 comments

rufuspollock Nov 5, 2018 Maintainer

danbri Nov 27, 2018

danbri Nov 27, 2018

danbri Nov 27, 2018

metaodi Nov 27, 2018

danbri Dec 7, 2018

rufuspollock Jan 6, 2019 Maintainer

HeidiSeibold
Nov 2, 2018

rufuspollock
Nov 5, 2018
Maintainer

danbri
Nov 27, 2018

danbri
Nov 27, 2018

danbri
Nov 27, 2018

metaodi
Nov 27, 2018

danbri
Dec 7, 2018

rufuspollock
Jan 6, 2019
Maintainer