Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of operations to support via API #6

Closed
jsheunis opened this issue Jan 20, 2025 · 4 comments
Closed

List of operations to support via API #6

jsheunis opened this issue Jan 20, 2025 · 4 comments

Comments

@jsheunis
Copy link

jsheunis commented Jan 20, 2025

Some background here: https://hub.datalad.org/datalink/tools/issues/13#issue-62

We need to start with a specification on what exactly to support as API endpoints. A useful use case to consider is shacl-vue, which allows annotating existing RDF data and creating new RDF data. shacl-vue would depend on fetching RDF data from the dump-things-service to support its operation. Consider shacl-vue being used in a research consortium, where researchers have to use it to add their publications, submit new data annotations, etc. They would want to have maximal and intuitive access to existing data to minimize their effort, think: add a Person record once by ORCID, link it as an Author in multiple places by just selecting it from a list, and having all previously entered Person records across the consortium available in that same list. Or a user might browse a shacl-vue-supported catalog of the consortium, browse to a specific Researcher and the page would need to display any/all data related to that entity.

The background issue started suggestions for a list of operations, and more are added here:

  • return all records (performance-wise not a good idea)
  • return all records of a specific class (more specifically rdf:type)
  • return a record with a specific ID
  • return all records related to a record with a specific ID (i.e. all properties of said record; possibly with the option of recursive getting)
  • allow specification of return format (json, yaml, rdf, ...)

Some practical aspects of the functioning of shacl-vue are important to consider:

  • shacl-vue currently fetches the complete set of RDF data and SHACL Shapes that it needs to operate on upfront, from served TTL files, using the fetch-lite package which returns a stream of quads
  • shacl-vue stores RDF data in the browser using an rdf.dataset() from the rdf-ext package
  • The rdf.dataset() is queried whenever a list of things need to be displayed, e.g. when a Person has to be selected from a list of Persons
  • currently, shacl-vue does not have knowledge of the LinkML schema (and version) that SHACL shapes were exported from (it's not part of the export), i.e. it would not be able to supply those parameters when making an API request.

Thoughts for improving shacl-vue and other services in light of the above:

  • shacl-vue could receive several optional inputs, including:
    • pointer to a TTL file with RDF data - this would be fetched upfront if supplied
    • pointer to API base url with endpoints specifications/mapping - this would allow dynamic mapping of the relevant endpoint URL and arguments for fetching during shacl-vue use (referenced in the last paragraph of this comment too: Refactor for modularity and reusability shacl-vue#65 (comment))
  • For shacl-vue a flag to query both (or either of) the local rdf.dataset() and the relevant dump-things-service endpoint for records; if both, this implies some sort of consolidation process which requires more thinking
  • We need to think about how best to share knowledge of schema name and version between linkml, SHACL export, and compliant RDF data.
  • If, for some reason, the complete set of records need to be made available to a client like shacl-vue, we could consider having the dump-things-service serve a regularly updated static export in TTL format.
@mih
Copy link

mih commented Jan 21, 2025

If, for some reason, the complete set of records need to be made available to a client like shacl-vue, we could consider having the dump-things-service serve a regularly updated static export in TTL format.

To me, this is asking for an "all you know in quads" endpoint. The rest is an implementation detail that should not be a concern for shacl-vue.

Therefore "shacl-vue could receive several optional inputs, including..." is a needless complication IMHO. A (URL) pointer to a TTL file should be (made) indistinguishable from an API endpoint that supplies the exact same thing. In both cases, this pointer would be a parameter of a deployment/runtime-sessions. And it would/should be optional.

We need to think about how best to share knowledge of schema name and version between linkml, SHACL export, and compliant RDF data.

To me, this is configuration. shacl-vue (typically?) will be pointed to one-and-exactly-one schema-version-record-dump. So it should get the associated label as a deployment/runtime parameter. In the current concept, the schema name/version/variant is all multiplexed onto a single label.

For shacl-vue a flag to query both (or either of) the local rdf.dataset() and the relevant dump-things-service endpoint for records; if both, this implies some sort of consolidation process which requires more thinking

A "local" query for refer to information that is not (yet) submitted to the service, right? If we support this kind of staging, we also need to (introduce?) some tracking of what has been modified in the entire session, to be able to "send" efficiently and a later point in time.

@jsheunis
Copy link
Author

We need to think about how best to share knowledge of schema name and version between linkml, SHACL export, and compliant RDF data.

To me, this is configuration. shacl-vue (typically?) will be pointed to one-and-exactly-one schema-version-record-dump. So it should get the associated label as a deployment/runtime parameter. In the current concept, the schema name/version/variant is all multiplexed onto a single label.

Should shacl-vue be pointed to "one-and-exactly-one schema-version-record-dump"? Do you mean "pointed to" in the sense of an individual request or on the deployment level? My understanding is it would, for example, need to include the things schema name in the request in order to get all ValueSpecifications, and to include the prov schema name to get all Agents. So shacl-vue should know that these classes come from those schemas (or schema-versions). I can think of a different use case: let's say a user views something like a Dataset, which displays all of its relations as just a list of IDs. The user clicks on one ID and wants the page to show all information about this Thing. The SHACL schema, which drives the UI, says the relation would be a Thing, but that doesn't preclude it from being any of the many subclasses of Thing. So, all shacl-vue knows at the point when it makes a request is that it could be Thing, when actually it should be making a request to (for example) the Person class. And it also does not know in which schema the Person class would be defined.

At the moment, shacl-vue knows the following:

  • a SHACL shapes file (which currently does not contain any information about a specific linkml schema-version; there are namespaces/prefixes, but they aren't necessarily mapped 1-to-1 with schemas)
  • an RDF class hierarchy (to retrieve rdfs:subClassOf relationship statements)
  • an RDF data source (currently a pointer to a TTL file, but this is where the dump-things-service needs to be injected)

For shacl-vue a flag to query both (or either of) the local rdf.dataset() and the relevant dump-things-service endpoint for records; if both, this implies some sort of consolidation process which requires more thinking

A "local" query for refer to information that is not (yet) submitted to the service, right? If we support this kind of staging, we also need to (introduce?) some tracking of what has been modified in the entire session, to be able to "send" efficiently and a later point in time.

No I meant a local query as in querying RDF data that has already been retrieved from the service endpoint and currently resides in a client side RDF store. For implementation, I'm thinking some sort of chaining of the query and then consolidation: first look at the local store, then the service endpoint, then consolidate the records. If the RDF store automatically handles redundant triples effectively (I need to check this), then the process can be switch around: first query the service endpoint, add response data to the local rdf store, store consolidates data, then query the store.

Although, your point is also an important one. At the moment, shacl-vue supports this either partially or pretty much in-full (I need to follow up on this). It has, by design, a structural separation between data in the rdf store and data edited in any form, this separation could be the proxy for "tracking of what has been modified in the entire session", but it could be formalized more with the use of some tags or further structuring.

@mih
Copy link

mih commented Jan 21, 2025

No I meant a local query as in querying RDF data that has already been retrieved from the service endpoint and currently resides in a client side RDF store

Although you say no, your explanation tells me that you mean yes -- and I might not have explained what I meant properly.

My understanding is it would, for example, need to include the things schema name in the request in order to get all ValueSpecifications, and to include the prov schema name to get all Agents

No. Check https://concepts.trr379.de/s/base/unreleased/ for example. You'll find that it contains all the classes you mentioned -- defined in different source schemas -- under a single, common umbrella. I think this is the normal situation.

If we would have to support different schema versions in one and the same editing environment, we'd also need to check the various schemas for compatibility, and also enable a user to pick the right schema for a particular class (many could provide it), and explain to the user how to pick, so they don't end up describing two things that needs to work together in two different schemas.

The only reason I can come up with there would be more than one schema (in the dump backend!), is the preservation of information in a historic configuration (backward compatibility). I cannot imaging a system (other than a migration script) that would need access to the same information in multiple schemas variants.

@mih mih closed this as completed Jan 21, 2025
@jsheunis
Copy link
Author

No. Check https://concepts.trr379.de/s/base/unreleased/ for example. You'll find that it contains all the classes you mentioned -- defined in different source schemas -- under a single, common umbrella. I think this is the normal situation.

Thanks for pointing this out, my previous understanding did not include this common umbrella schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants