Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy virtuoso and load some data #3

Open
pvgenuchten opened this issue Apr 2, 2024 · 1 comment
Open

deploy virtuoso and load some data #3

pvgenuchten opened this issue Apr 2, 2024 · 1 comment

Comments

@pvgenuchten
Copy link
Contributor

pvgenuchten commented Apr 2, 2024

a getting started on loading some dcat data

as described in EJPsoil wiki you can spin a local instance of virtuoso using this docker compose.

you can upload the zip attached (as extracted from https://nationaalgeoregister.nl) using quad-store-upload (in the linked data menu of virtuoso conductor), set the iri to some iri (you need it when querying the graph)

image

Run a sparql query from the sparql panel:

summary of graph

select distinct ?class (count(?thing) as ?numInstances) where {
  graph <http://soilwise-he.eu> {
  ?thing a ?class .
  }
}

datasets per organisation

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT ?organization (count (distinct ?dataset) as ?numDatasets)
(count (distinct ?distribution) as ?numDistributions)
WHERE {
  graph <http://soilwise-he.eu> {
 ?dataset a dcat:Dataset ;
          dcterms:publisher ?organization ;
      			dcat:distribution ?distribution .  }
} 
group by ?organization
order by desc(?numDatasets)
limit 20

count by license

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?url (count(distinct ?dataset) as ?count) WHERE {
SELECT (IRI(?name) AS ?url) ?dataset WHERE {
            ?dataset a dcat:Dataset .
			?dataset dct:license ?name
			FILTER(isLiteral(?name))
		}
} GROUP BY ?url

count by format

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?format (count(distinct ?dataset) as ?count) WHERE {
SELECT ?dataset ?format WHERE {
  ?dataset dcat:distribution ?distribution .
  ?distribution dct:format ?format .
}
} GROUP BY ?format

Exported records from NGR https://www.nationaalgeoregister.nl/geonetwork/srv/api/rdf.search?from=100.
Some observations:

  • Max number of records exported is 100, you need to crawl by 100 (from=100,200,300) to fetch all
  • Some of those iterations fail when loading on virtuoso (invalid graph data)

Download: rdf200.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants