Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clients should work on docs retrieved from git repo not from fuseki #4

Open
xristy opened this issue Nov 18, 2019 · 4 comments
Open
Assignees

Comments

@xristy
Copy link
Contributor

xristy commented Nov 18, 2019

when a user checks out an existing resource for editing, the version in the git repo should be retrieved and updated and when saved 1) written back to the repo and then 2) run through a standard inference step prior to 3) updating fuseki with the new graph(s).

This implies that various bulk updates like replacing all references to A by references to B need to be effected over the docs in the repo which are then run through inferencing prior to sending to fuseki.

We will be depending on significant inferencing to add triples that will trade-off space for time in the online fuseki environment, but we do want only ground triples in the repo for public access.

This issue pertains to how the editserv processes GET http://purl.bdrc.io/graph/theID as mentioned in bullet 2 of edit-one-existing-resource and the POST in bullet 6.

@MarcAgate
Copy link
Collaborator

MarcAgate commented Nov 19, 2019

ES (Editing Server) only knows and manage Task and Sessions and therefore never requests GET http://purl.bdrc.io/graph/theID : only EC(Editing client) does. .
EC then create a Task that allows for editing then PUT this task to ES.
ES writes the Task to the Task git repo (which is different from the gitlab resource git repo), each commit being a new Session of this task.

From here, the editor (the person editing) can work on this resource by requesting this ongoing (not completed) task. There is no interaction with gitlab repo or fuseki dataset during the editing process. Note that this editor can list all his tasks and all the sessions of these tasks.

When the editing is complete, then the editor requests (through EC) the actual update of the fuseki dataset by posting the task to ES (instead of PUT) --> POST http://editserv.bdrc.io/tasks/ (the task being sent as json).

Currently, the POST proceeds the actual updates using a Git last / Fuseki first approach using two modules:

  1. The Patch Module that applies the patch corresponding to the last session of the given task to a dataset built from fuseki then put this updated dataset back to fuseki.
  2. The GitPatch (that updates the git lab repo) and GitRevision module that updates fuseki with the commit revision info obtained from GitPatch.

The issue you raised applies to the way ES processes the POST.

One way to solve it is to maintain the current order (i.e the current transaction modules flow) while getting the dataset used in PatchModule from the remote gitlab repo (instead of fuseki as it is now), and then applying the patch, as now. At this point it would then just be a matter of adding inference to the patched graphs before putting them back to fuseki.
Also, in the transaction flow as it is, the newly updated and inferred dataset would then be used by GitMatch and GitRevisionModule (so inferred triples would be included into the trig files on gitlab repo)

**The solution above obviously works for resource updates or creation, but there is still an issue with "bulk updates" **: for instance, when replacing A by B how do we know which resource file in the gitlab repo has a reference to A ? Should we build a Dataset out of the whole git repo ? Should we get a list of these referencing resources by sending a sparql request to fuseki (EDIT: the current implementation does that for replace) ? Also, If we replace A by B, what could be the new inferred triples (since the replacement should apply to the previously inferred triples as well), WDYT ?
The same kind of questioning applies to resource deletion (i.e updating all the resources referencing the deleted resource).

When talking about standard inference, what do you mean exactly, regarding the two main reasoners available in Jena (OWLReasoner and RDFSReasoner) ?

Note: regarding the FinalizerModule (last action of a transaction): The POST actually transforms a Task into a Transaction. This means that the successfully posted task file is moved from the local task repo to a transaction archive dir while a full transaction log is saved under /logs/{user}/

Note: both Editing_Service_Use_Cases.md and editservDoc.odt need to be updated.

@eroux
Copy link
Contributor

eroux commented Nov 19, 2019

I think there are different issues here:

  • for the batch editing (replacing A with B), I think the process should be:
    • run a sparql query that fetches all the graphs that include a triple with A
    • exclude the main graph of A
    • for each graph, apply a patch in the normal way
  • I agree the editor is currently agnostic about where the patch comes from (if it was based on a git file, on a sparql query, or just came up with in a text editor)
  • that said, the web editor in JS (not editserv) may benefit from the ability to access the triples in a git file, and why not on different commits / branches... We could imagine an API like:

editserv.bdrc.io/gitGraphs/XXX/YYY(/commit/ZZZ)

where:

  • XXX is the URI of the git repo in the ontology (see here where the URLs of the git repos are interestingly wrong)
  • YYY is the path of the file in the git repo
  • ZZZ is a commit

@xristy
Copy link
Contributor Author

xristy commented Nov 20, 2019

From the batch editing section above, does "for each graph, apply a patch in the normal way" mean to write each modified graph to git and then transfer to fuseki via after running the inferencing step?

@xristy
Copy link
Contributor Author

xristy commented Nov 20, 2019

Regarding "When talking about standard inference, what do you mean exactly, regarding the two main reasoners available in Jena (OWLReasoner and RDFSReasoner) ?" The io.bdrc.gittodbs.BDRCReasoner will perform the inferencing. Probably the reasoner could be moved to bdrc-libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants