-
Notifications
You must be signed in to change notification settings - Fork 665
GH-2701: Fuseki Mod to list and abort running executions. #3184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0f8a131
to
a5ca301
Compare
Really interesting piece of work, once did something much cruder (at least UI wise) in a previous $dayjob
Yes I think this would be much cleaner if the tracking mechanism was integrated directly into the execution machinery without requiring extra wrapping as you do in this PR. It would be nice if there were programmatic APIs for interacting with tracked queries/updates (there's some pieces towards that here but appears mostly focused on exposing stuff to the UI from my skim-reading of the code) so that applications that embed Jena could access and manage tracked queries/updates as desired. Fuseki already has the concept of Tasks that's used for things like backups and compactions, would it make sense to integrate query/update tracking into that rather than creating a separate tracking mechanism. That might need generalising that mechanism, or pulling it more into Jena's core rather than Fuseki machinery, so might not be worth the effort, wdyt? |
The ARQ Plugin adds An important question is, whether tracking executions within the DatasetGraph's context is the way to move forward.
Yes, I yet need to look into how much effort it would be to disentangle Fuseki' task tracker from the Fuseki - but adding such a mechanism to core (and updating Fuseki for it) would be most likely the way to go. |
28ccc1b
to
603c8e7
Compare
0d4c658
to
61039ae
Compare
public class ChainingQueryDispatcherExecTracker | ||
implements ChainingQueryDispatcher | ||
{ | ||
@Override | ||
public QueryExec create(Query query, DatasetGraph dsg, Binding initialBinding, Context context, | ||
QueryDispatcher chain) { | ||
QueryExec delegate = chain.create(query, dsg, initialBinding, context); | ||
QueryExec result = TaskTrackerRegistry.track(context, delegate); | ||
TaskTrackerRegistry.remove(context); | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant code for intercepting query execution construction over a dataset graph using the newly proposed dispatcher system. Here, execution tracking is added.
The original code integrated exec tracking into the Update-/QueryEngineFactory system. However, this was sub-par because execution could only be intercepted on the QueryIterator level. My updated proposal is to introduce a new layer Update-/QueryDispatcher on top of the Update-/QueryEngineFactory machinery. Now it is possible to intercept any query / update request to a dataset - even without having to parse the query. Old design: QueryExecBuilderDataset -build-> QueryExecDataset -exec-> QueryEngineRegistry New design: QueryExecBuilderDataset -build-> QueryDispatcherRegistry -customized build-> QueryExec The last element of the dispatcher chain forwards the request to the usual Update-/QueryEngineFactory system. Consequences:
Related Ongoing WorkAs a demo for related work based on this infrastructure, we are using it to integrate third party triple stores - such as Qlever - into Fuseki. This way we can use one server framework to manage several triple stores. As a final note, we also already created an assembler that starts qlever from a docker image as part of Fuseki (via the Java TestContainer framework), so the configuration looks like this: <#baseDS> rdf:type qlever:Dataset ;
qlever:location "/run/databases/qlever/mydb/" ;
qlever:indexName "myindex" ;
qlever:accessToken "abcde" ;
qlever:memoryMaxSize "4G" ;
qlever:defaultQueryTimeout "600s" ; |
1833ad7
to
7f59fa7
Compare
91e9758
to
c107201
Compare
f91303c
to
df88500
Compare
The Fuseki module contains an HTML interface that includes a control to stop an update or query execution.
I'm not sure what the best thing to do it but one possibility is:
|
jena-fuseki2/jena-fuseki-mod-exectracker/src/main/resources/exec-tracker/index.html
Outdated
Show resolved
Hide resolved
jena-fuseki2/jena-fuseki-mod-exectracker/src/main/resources/exec-tracker/index.html
Outdated
Show resolved
Hide resolved
...-exectracker/src/test/java/org/apache/jena/fuseki/mod/exec/tracker/TestFMod_ExecTracker.java
Show resolved
Hide resolved
Its possible to combine this with Fuseki's security and there is a context flag to disable the stop action (both in HTML and the API): <#service> rdf:type fuseki:Service ;
fuseki:name "coypu" ;
# ...
fuseki:endpoint [
fuseki:operation fuseki:tracker ;
fuseki:name "tracker" ;
ja:context [ ja:cxtName "allowAbort" ; ja:cxtValue false ] ; # Disable stop action
] ;
fuseki:endpoint [
fuseki:operation fuseki:tracker ;
fuseki:name "admin-tracker" ;
fuseki:allowedUsers "admin" ; Our deployment:
Pending improvements:
Hm, if there was a context on the server level (not sure right now if there already is one), then the exec tracker fmod could wire up the listeners such that all events from the datasets are delegated to the server-wide listener. |
a08d2bd
to
81058d6
Compare
a7868ba
to
b1013cb
Compare
Unrelated but IMO the transport details should be handled by clients external to |
The goal of For SPARQL, the current design already allows configuration of protocol matters on the RDFLink level. Creator<RDFLink> linkCreator = () -> RDFLinkHTTP.newBuilder()
.destination("http://dbpedia.org/sparql")
// Request using thrift instead of the default application/sparql-results+json.
.acceptHeaderSelectQuery(WebContent.contentTypeResultsThrift)
.build()
DatasetGraph dsg = new DatasetGraphOverRDFLink(linkCreator);
QueryExec.dataset(dsg).query("...")...; // Queries will be dispatched to the link and
// execution won't use the DatasetGraph API. The fundamental issue is that DatasetGraph is central to most parts of Jena (up to Fuseki). At some point in the future - in the appropriate places - it might be worth superseding DatasetGraph with a more general
I think in principle the abstraction with a configurable update sink is ok, but I agree that a custom [Edit] A thought: If both |
@Aklakan I understand the need for a proxy but I'm not convinced it should live in a Conceptually the proxy is not of a dataset - it is of an endpoint. A dataset should not care where it came from - it's just a bag of quads. To illustrate, this is a SPARQL endpoint implementation in JAX-RS. The separation of concerns is much cleaner IMO. |
DatasetGraph can play a dual role:
In The same goes for I am not too fond of the proposed dispatcher/interception system because it only intercepts query/update execution but not on the connection process itself. So I am indeed thinking about moving the interception to a higher level. The main aspects I am pondering over are:
This is indeed conceptually similar to |
Agree to disagree :)
This is why I'm only using Jena's in-memory |
The world could be built on this but it will be unsatisfactory because it is too fine-grained for web use.
Sometimes, your have code that works on datasets but you want to apply it to something that only supports SPARQL protocols. You need an adapter to invert the stack. Adapters that map low-level APIs to higher-level APIs and are rarely great for performance or using the details of the high-level API but if the difference is access/no-access, an adapater gets the job done. |
// public void cancel() { | ||
// closed = true; | ||
// abort(); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove.
@Aklakan -- In order to make progress, can this be split into multiple PRs? The first would be broken down into changes in ARQ for async. This would give us some focus to complete subtask, and also help review because this single PR is causing the github review UI to struggle! A question I'm trying to answer for myself is how much of this is fundamental machinery and how much is driven by your usage of Jena. |
Yes it makes sense to split the (mostly) finished parts out from this pr for isolated review. I factored the async http part out in #3464 . My suggestion is to use this pr as the "full stack" branch from which smaller PRs can be derived if the overall system works. I plan on making an updated proposal on the query/update execution tracking part around next week. |
b1013cb
to
da54dff
Compare
…andling for DROP requests.
da54dff
to
942c0d7
Compare
GitHub issue resolved #2701
Pull request Description: ARQ plugin + Fuseki Plugin to track and abort ongoing SPARQL executions.
I tried to design the changes to jena's core in such a way that execution tracking can be enabled without requiring any changes to existing code.
Summary of changes.
Interception of SPARQL Requests and Execution Tracking
Changes in
jena-arq
:SparqlDispatcherRegistry
+ infrastructure which allows to intercept SPARQL update and query statements (as objects or strings) against DatasetGraphs. Update-/QueryExecDataset now first delegates to the dispatcher chain.InitExecTracking
. If in the dispatcher chain there is aTaskListener
is the context, then the Update-/QueryExec instances are wrapped with a tracking wrapping such that the listener is notified.TaskListener
implementation isTaskEventBroker
which supports de-/registering of further listeners.n
executions, there isTaskEventHistory
which is a subclass ofTaskEventBroker
.parseCheck
dataset context attribute. If false, then update-/query requests are forwarded via the dispatcher without parsing.RDFLink-based Execution Tracking
This adds infrastructure to
jena-rdfconnection
in order to track executions against RDFLinks via the newly introduced classDatasetGraphOverRDFLink
.DatasetGraphOverSparql
. This is a base class injena-arq
that implements all methods by means of SPARQL requests. It is wired up with the tests inTS_SparqlCore
. Caveat: As each update is a separate request and bulk updates may be split into multiple requests, blank nodes may not work as expected.DatasetGraphOverRDFLink
as a subclass ofDatasetGraphOverSparql
which provides anewRDFLink()
method and implements all DatasetGraph methods based on the RDFLink.DatasetGraphOverRDFLink
and delegates them to the RDFLink.DatasetAssemblerHTTP
which configures aDatasetGraphOverRDFLink
instance, which allows use of this system with Fuseki.ExampleDBpediaViaRemoteDataset
which demonstrates a query against such a dataset making use of virtuoso specific features.Fuseki Mod: Execution Tracker
Added a simple web page that will show a live view of ongoing and completed queries.
TaskEventHistory
in the endpoint context and connect it to the dataset context'sTaskEventBroker
(broker will be created if absent).Jena-Query-Dashboard.webm
Misc changes
QueryExecHTTP
: Improved support to cancel HTTP requests. So far cancel would hang until the InputStream of the HTTP response could be obtained. Now the HTTP request can be cancelled immediately.jena-geosparql
with that of the execution tracking.By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.
See the Apache Jena "Contributing" guide.