API to provide information about subject indexing in the K10plus catalog
This API can be used to query how a concept or combination of concepts is used in records of a database. This basically includes: which concepts a record is index with (subjects), which records have been indexed with a concept (records), the number of records indexed with a concept and/or a deep link into a catalog to get these records (occurrences, links), and which concepts are used together with other concepts (co-occurrences).
Requires Node.js v18 or newer.
git clone https://github.com/gbv/subjects-api.git
cd subjects-api
npm i
Optionally create a configuration file .env
to change certain config options. Here are the default values:
PORT=3141
BACKEND=SQLite
DATABASE=./subjects.db
SCHEMES=./vocabularies.json
LINKS=./links.json
All vocabularies included in K10Plus Subjects are preconfigured via vocabularies.json
.
Then full the backend database (SQLite by default) with subject indexing data from K10plus catalog. The script ./bin/import.js
can be used to do so (not documented yet).
Requires to start the application once to create SQLite database file under subjects.db
. Then download data from https://doi.org/10.5281/zenodo.7016625 (given as tabulator separated file table with columns PPN, vocabulary key, and notation) and import into SQLite file:
URL=$(curl -sL "https://zenodo.org/api/records/7016625" | jq -r '.files[]|select(.key|endswith(".tsv.gz"))|.links.self')
curl -sL $URL | zcat | sqlite3 subjects.db -cmd ".mode tabs" ".import /dev/stdin subjects"
Requires a SPARQL-Endpoint, including SPARQL Update and SPARQL Graph Store Protocol for write access. Only tested with Apache Jena Fuseki.
BACKEND=SPARQL
DATABASE=http://localhost:3030/k10plus
GRAPH=https://uri.gbv.de/graph/kxp-subjects # optional
Requires PostgreSQL database. It turned out performance of SQLite is better, for this reasons this backend is not developed further.
Requires a SRU-API to query from so live data can be returned.
npm run start
Some backends allow to import data from a headerless TSV file with three
columns for PPN, vocabulary id (VOC
), and notation:
npm run import -- subjects.tsv
Option --full
replaces the existing backend data, otherwise the data is added
to existing subjects data. Option --modified
can be used to set the
modification date (timestamp of file by default).
Requires DATABASE
set to URL of SPARQL endpoint. Optionally configure a named graph with GRAPH
.
Returns a (possibly empty) array of JSKOS Concepts a record is indexed with. The special value null
can be included as last array element to indicate that more subjects may exist.
Query parameters:
record
- URIs of records, separated by|
scheme
- URIs of concept schemes, separated by|
. The default value*
can be used to include all concept schemes.
This endpoint returns the same information as /occurrences endpoint with query parameter record
and scheme
(parameter member
not set) but with different output format (JSKOS Concepts instead of Concept Occurrences).
Returns an array of records with given subject.
Return format is experimental
Query parameters:
subjects
- URI of a concept from supported vocabularieslimit
- maximum number of records to return (10 by default)format
- return format (not supported yet)
Returns a (possibly empty) array of JSKOS Concept Occurrences. Depending on query parameters the result consists of:
- the occurrence of a concept specified via
member
- the occurrence of concepts in a record specified via
record
- the co-occurrences of a concept specified via
member
in all records, when query parameterscheme
is given
Occurrences contain deep links into K10plus catalog for selected vocabularies.
Query parameters:
member
- URI of a concept from supported vocabulariesrecord
- URI of a recordscheme
- URI of a target concept scheme (when given, co-occurrences are returned; when value*
is given, all supported target schemes are used)threshold
- a minimum threshold for co-occurrences to be included
There is a deprecated alias at /api
to be removed soon.
Alias for GET /voc to support clients that only know about Occurrences API by its base URL /occurrences
.
Not implemented yet, see #44.
Return a list if deep links into database to list all records indexed with a given concept.
Query parameters:
subject
- URIs of a concepts
Return format:
JSON Array of objects, each with:
url
label
(name of the database)description
(optional)
This endpoint returns the same information as /occurrences endpoint with query parameter subject
instead of member
but a different return format and no number of records.
Returns an array of supported vocabularies as JSKOS Concept Schemes.
There is a deprecated alias at /api/voc
to be removed soon and a stable alias at /occurrences/voc
.
Returns an array of supported databases. Return format is experimental.
Returns information about the service. Return format is experimental.
PRs accepted against the dev
branch.
Small note: If editing the README, please conform to the standard-readme specification.
MIT © 2022 Verbundzentrale des GBV (VZG)