Skip to content

Lesson 2B: Foster to link everything with everything

Linda van den Brink edited this page Jun 6, 2017 · 16 revisions

Be aware that you don't just publish data on the web, by doing so you create data on the web, others will start to link to your data, thus creating added value for your data.

Why?

Internal links in the dataset implicate that there are other resources in the dataset as well. Crawlers can use these links to navigate to the resources, for example when there is pagination in place.

If external sites link to your resources, chances are that crawlers already know these sites and follow the links to your resources.

Also, linking data sets enables to gain new insights and creating valuable and structured information.

Intended outcome

Linked data sets, and high(er) page ranks.

Possible approach

Separate data sets can be linked together if they both rely on the same principles.

The RDF framework dictates that the data is recorded according to the triple sequence of subject, predicate, object.

Links are established via HTTP resolvable identifiers in RDF triple statements, for example:http://www.ldproxy.net/bag/inspireadressen/inspireadressen.1/. When these addresses are resolved the RDF data must be retrieved.

To improve discoverability, backward and forward linking is advised.

Example JSON-LD:

{"@context": "http://schema.org",
"@id": "<http://the_Bekendmakingen_base_uri/the_Report_identifier> ",
"contentLocation": {
"@id": "<http://the_BAG_base_uri/the_Place_identifier>"
}
}

In the second phase of the Testbed we saw that the more traditional Semantic Web technologies like RDF and SPARQL are complicated for some developers, and following and reasoning across RDF links is not always easy. Therefore it is important to also consider alternatives such as the more simple REST and JSON.

We also observed that automatic linking of concepts is hard, because the same text may mean different thing. For instance the municipality of Appingedam is denoted by multiple values such as ‘Appingedam’ and ‘GM0003’ (gemeentecode). At the same time, occurrences of the same string can denote different things (e.g., the area of Appingedam changes over time, the municipality of Appingedam is more than just the area of Appingedam, and a tourist uses ‘Appingedam’ to denote the center of Appingedam). The same is true for other value types, e.g., ‘1903’ may denote a year in the Gregorian calendar or the length of a road in meters.