Skip to content

Commit

Permalink
Merge pull request #143 from RDFLib/edmond/feature/bgs
Browse files Browse the repository at this point in the history
Add reg:status to vocpub profile as an annotation prop. Add rdf:type to the vocabs API and add fix to ensure profile predicates and their values get annotation values added to response
  • Loading branch information
edmondchuc authored Aug 9, 2023
2 parents 070d8e1 + f8bc090 commit a56cb9a
Show file tree
Hide file tree
Showing 28 changed files with 1,015 additions and 190 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/on_pr_to_main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,6 @@ jobs:
cd ../catprez && poetry run pytest
cd ../profiles && poetry run pytest
cd ../services && poetry run pytest
cd ../curies && poetry run pytest
cd ../identifier && poetry run pytest
cd ../object && poetry run pytest
cd ../caching && poetry run pytest
1 change: 1 addition & 0 deletions .github/workflows/on_push_to_feature.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,5 @@ jobs:
cd ../catprez && poetry run pytest
cd ../profiles && poetry run pytest
cd ../services && poetry run pytest
cd ../identifier && poetry run pytest
# cd ../local_sparql_store && poetry run pytest
36 changes: 22 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,14 @@ It expects "high quality" data to work well: Prez itself won't patch up bad or m
Prez accesses data stored in an RDF database - a 'triplestore' - and uses the SPARQL Protocol to do so. Any SPARQL Protocol-compliant DB may be used.

## Redirect Service

As a Linked Data server, Prez provides a redirect service at `/identifier/redirect` that accepts a query parameter `iri`, looks up the `iri` in the database for a `foaf:homepage` predicate with a value, and if it exists, return a redirect response to the value.

This functionality is useful for institutions who issue their own persistent identifiers under a domain name that they control. The mapping from the persistent identifier to the target web resource is stored in the backend SPARQL store.

This is an alternative solution to persistent identifier services such as the [w3id.org](https://w3id.org/). In some cases, it can be used together with such persistent identifier services to avoid the need to provide the redirect mapping in webserver config (NGINX, Apache HTTP, etc.) and instead, define the config as RDF data.

## Development

This section is for developing Prez locally. See the [Running](#running) options below for running Prez in production.
Expand Down Expand Up @@ -84,20 +92,20 @@ via python-dotenv, or directly in the environment in which Prez is run. The envi
instantiate a Pydantic `Settings` object which is used throughout Prez to configure its behaviour. To see how prez
interprets/uses these environment variables see the `prez/config.py` file.

| Environment Variable | Description |
|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SPARQL_ENDPOINT | Read-only SPARQL endpoint for SpacePrez |
| SPARQL_USERNAME | A username for Basic Auth against the SPARQL endpoint, if required by the SPARQL endpoint. |
| SPARQL_PASSWORD | A password for Basic Auth against the SPARQL endpoint, if required by the SPARQL endpoint. |
| PROTOCOL | The protocol used to deliver Prez. Usually 'http'. |
| HOST | The host on which to server prez, typically 'localhost'. |
| PORT | The port Prez is made accessible on. Default is 8000, could be 80 or anything else that your system has permission to use |
| SYSTEM_URI | Documentation property. An IRI for the Prez system as a whole. This value appears in the landing page RDF delivered by Prez ('/') |
| LOG_LEVEL | One of CRITICAL, ERROR, WARNING, INFO, DEBUG. Defaults to INFO. |
| LOG_OUTPUT | "file", "stdout", or "both" ("file" and "stdout"). Defaults to stdout. |
| PREZ_TITLE | The title to use for Prez instance |
| PREZ_DESC | A description to use for the Prez instance |
| DISABLE_PREFIX_GENERATION | Default value is `false`. Very large datasets may want to disable this setting and provide a predefined set of prefixes for namespaces as described in [Link Generation](README-Dev.md#link-generation). |
| Environment Variable | Description |
| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| SPARQL_ENDPOINT | Read-only SPARQL endpoint for SpacePrez |
| SPARQL_USERNAME | A username for Basic Auth against the SPARQL endpoint, if required by the SPARQL endpoint. |
| SPARQL_PASSWORD | A password for Basic Auth against the SPARQL endpoint, if required by the SPARQL endpoint. |
| PROTOCOL | The protocol used to deliver Prez. Usually 'http'. |
| HOST | The host on which to server prez, typically 'localhost'. |
| PORT | The port Prez is made accessible on. Default is 8000, could be 80 or anything else that your system has permission to use |
| SYSTEM_URI | Documentation property. An IRI for the Prez system as a whole. This value appears in the landing page RDF delivered by Prez ('/') |
| LOG_LEVEL | One of CRITICAL, ERROR, WARNING, INFO, DEBUG. Defaults to INFO. |
| LOG_OUTPUT | "file", "stdout", or "both" ("file" and "stdout"). Defaults to stdout. |
| PREZ_TITLE | The title to use for Prez instance |
| PREZ_DESC | A description to use for the Prez instance |
| DISABLE_PREFIX_GENERATION | Default value is `false`. Very large datasets may want to disable this setting and provide a predefined set of prefixes for namespaces as described in [Link Generation](README-Dev.md#link-generation). |

### Running in a Container

Expand Down
4 changes: 2 additions & 2 deletions prez/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from prez.routers.spaceprez import router as spaceprez_router
from prez.routers.sparql import router as sparql_router
from prez.routers.vocprez import router as vocprez_router
from prez.routers.curie import router as curie_router
from prez.routers.identifier import router as identifier_router
from prez.services.app_service import healthcheck_sparql_endpoints, count_objects
from prez.services.app_service import populate_api_info, add_prefixes_to_prefix_graph
from prez.services.exception_catchers import (
Expand Down Expand Up @@ -61,7 +61,7 @@
app.include_router(catprez_router)
app.include_router(vocprez_router)
app.include_router(spaceprez_router)
app.include_router(curie_router)
app.include_router(identifier_router)


@app.middleware("http")
Expand Down
18 changes: 18 additions & 0 deletions prez/queries/identifier.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
from textwrap import dedent

from jinja2 import Template


def get_foaf_homepage_query(iri: str) -> str:
query = Template(
"""
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?url
WHERE {
<{{ iri }}> foaf:homepage ?url .
}
"""
).render(iri=iri)

return dedent(query)
8 changes: 8 additions & 0 deletions prez/queries/vocprez.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,15 @@ def get_concept_scheme_top_concepts_query(iri: str, page: int, per_page: int) ->
?concept prez:childrenCount ?narrowerChildrenCount .
?iri prez:childrenCount ?childrenCount .
?iri skos:hasTopConcept ?concept .
?iri rdf:type ?type .
?concept rdf:type ?conceptType .
}
WHERE {
BIND(<{{ iri }}> as ?iri)
?iri skos:hasTopConcept ?concept .
?concept skos:prefLabel ?label .
?iri rdf:type ?type .
?concept rdf:type ?conceptType .
{
SELECT (COUNT(?childConcept) AS ?childrenCount)
Expand Down Expand Up @@ -113,11 +117,15 @@ def get_concept_narrowers_query(iri: str, page: int, per_page: int) -> str:
?concept prez:childrenCount ?narrowerChildrenCount .
?iri prez:childrenCount ?childrenCount .
?iri skos:narrower ?concept .
?iri rdf:type ?type .
?concept rdf:type ?conceptType .
}
WHERE {
BIND(<{{ iri }}> as ?iri)
?concept skos:broader ?iri .
?concept skos:prefLabel ?label .
?iri rdf:type ?type .
?concept rdf:type ?conceptType .
{
SELECT (COUNT(?childConcept) AS ?childrenCount)
Expand Down
2 changes: 1 addition & 1 deletion prez/reference_data/profiles/vocprez_default_profiles.ttl
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ prez:VocPrezProfile
dcterms:identifier "vocpub"^^xsd:token ;
dcterms:title "VocPub" ;
altr-ext:hasLabelPredicate skos:prefLabel ;
altr-ext:otherAnnotationProps schema:color ;
altr-ext:otherAnnotationProps schema:color, reg:status ;
altr-ext:constrainsClass
skos:ConceptScheme ,
skos:Concept ,
Expand Down
42 changes: 26 additions & 16 deletions prez/renderers/renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,24 @@ async def return_rdf(graph, mediatype, profile_headers):
return StreamingResponse(content=obj, media_type=mediatype, headers=profile_headers)


async def get_annotations_graph(profile, graph, cache):
profile_annotation_props = get_annotation_predicates(profile)
queries_for_uncached, annotations_graph = await get_annotation_properties(
graph, **profile_annotation_props
)

if queries_for_uncached is None:
anots_from_triplestore = Graph()
else:
anots_from_triplestore = await queries_to_graph([queries_for_uncached])

if len(anots_from_triplestore) > 1:
annotations_graph += anots_from_triplestore
cache += anots_from_triplestore

return annotations_graph


async def return_annotated_rdf(
graph: Graph,
profile_headers,
Expand All @@ -84,27 +102,19 @@ async def return_annotated_rdf(
non_anot_mediatype = mediatype.replace("anot+", "")

cache = tbox_cache
profile_annotation_props = get_annotation_predicates(profile)
queries_for_uncached, annotations_graph = await get_annotation_properties(
graph, **profile_annotation_props
)

if queries_for_uncached is None:
anots_from_triplestore = Graph()
else:
anots_from_triplestore = await queries_to_graph([queries_for_uncached])
previous_triples_count = len(graph)

if len(anots_from_triplestore) > 1:
annotations_graph += anots_from_triplestore
cache += anots_from_triplestore
# Expand the graph with annotations specified in the profile until no new statements are added.
while True:
graph += await get_annotations_graph(profile, graph, cache)
if len(graph) == previous_triples_count:
break
previous_triples_count = len(graph)

generate_prez_links(graph, predicates_for_link_addition)

obj = io.BytesIO(
(graph + annotations_graph).serialize(
format=non_anot_mediatype, encoding="utf-8"
)
)
obj = io.BytesIO(graph.serialize(format=non_anot_mediatype, encoding="utf-8"))
return StreamingResponse(
content=obj, media_type=non_anot_mediatype, headers=profile_headers
)
Expand Down
34 changes: 32 additions & 2 deletions prez/routers/curie.py → prez/routers/identifier.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,43 @@
from fastapi import APIRouter, HTTPException, status
from fastapi.responses import PlainTextResponse
from fastapi import APIRouter, HTTPException, status, Request
from fastapi.responses import PlainTextResponse, RedirectResponse
from rdflib import URIRef
from rdflib.term import _is_valid_uri

from prez.services.curie_functions import get_uri_for_curie_id, get_curie_id_for_uri
from prez.queries.identifier import get_foaf_homepage_query
from prez.sparql.methods import sparql_query_non_async

router = APIRouter(tags=["Identifier Resolution"])


@router.get(
"/identifier/redirect",
summary="Get a redirect response to the resource landing page",
response_class=RedirectResponse,
responses={
status.HTTP_404_NOT_FOUND: {"content": {"application/json": {}}},
},
)
def get_identifier_redirect_route(iri: str, request: Request):
"""
The `iri` query parameter is used to return a redirect response with the value from the `foaf:homepage` lookup.
If no value is found, a 404 HTTP response is returned.
"""
query = get_foaf_homepage_query(iri)
_, result = sparql_query_non_async(query)
url = None
for row in result:
url = row["url"]["value"]

if url is None:
raise HTTPException(
status.HTTP_404_NOT_FOUND, f"No homepage found for IRI {iri}."
)

# Note: currently does not forward query parameters but we may want to implement this in the future.
return RedirectResponse(url, headers=request.headers)


@router.get(
"/identifier/curie/{iri:path}",
summary="Get the IRI's CURIE identifier",
Expand Down
2 changes: 1 addition & 1 deletion prez/routers/object.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from starlette.responses import PlainTextResponse

from prez.models import SpatialItem, VocabItem, CatalogItem
from prez.routers.curie import get_iri_route
from prez.routers.identifier import get_iri_route
from prez.sparql.methods import sparql_query_non_async
from prez.queries.object import object_inbound_query, object_outbound_query

Expand Down
2 changes: 1 addition & 1 deletion prez/routers/vocprez.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
get_concept_narrowers_query,
)
from prez.response import StreamingTurtleAnnotatedResponse
from prez.routers.curie import get_iri_route
from prez.routers.identifier import get_iri_route

router = APIRouter(tags=["VocPrez"])

Expand Down
3 changes: 3 additions & 0 deletions tests/data/spaceprez/input/redirect-foaf-homepage.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

<http://data.bgs.ac.uk/id/dataHolding/13603129> foaf:homepage <http://metadata.bgs.ac.uk/geonetwork/srv/eng/catalog.search#/metadata/9df8df53-2a1d-37a8-e044-0003ba9b0d98> .
Loading

0 comments on commit a56cb9a

Please sign in to comment.