Skip to content

Conversation

Aklakan
Copy link
Contributor

@Aklakan Aklakan commented Sep 30, 2025

GitHub issue resolved #3473

Pull request Description: In jena-geosparql, replaced all lookups of spatial objects by type with lookups by the corresponding relevant properties. E.g. instead of searching for instances of type geo:Geometry, instances with geo:asWKT, geo:asGML and geo:hasSerialization properties are looked for.

  • Consolidated all vocabulary-based graph access into AccessGeoSPARQL and AccessWGS84 classes.

  • Added benchmark. No performance issues revealed. Detailed results in this thread.

  • Spatial Indexer UI: Replace mode with no selected graphs clears the index

  • Added computeIfAbsent/Present methods to Context.

  • Fixed missing override of DatasetGraphRDFS.find()

  • Fixed possible resource leaks by changing forEachRemaining to forEach which closes the iterator after use. Credits to @SimonBin for spotting this.


  • GenericPropertyFunction tests are covered by existing test suite
  • Added new test cases based on the reported issue.
  • (I tested the UI change manually, but did not add an automated test)
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

@Aklakan
Copy link
Contributor Author

Aklakan commented Oct 1, 2025

Converted to draft because there are more instances in GenericPropertyFunction that rely on explicit types, such as:

if (!(boundNode.isLiteral() ||
graph.contains(boundNode, RDF.type.asNode(), Geo.SPATIAL_OBJECT_NODE) ||
graph.contains(boundNode, RDF.type.asNode(), Geo.FEATURE_NODE) ||
graph.contains(boundNode, RDF.type.asNode(), Geo.GEOMETRY_NODE))) {
if (!graph.contains(boundNode, SpatialExtension.GEO_LAT_NODE, null)) {

I am looking into consolidating this into a separate set of methods that check whether resources are features or geometries and then retrieve the appropriate literal. Perhaps even an interface that let's one choose whether to classify resources by types or by properties.

@Aklakan Aklakan force-pushed the 20250930_spatialindex-replace branch 5 times, most recently from 1623501 to 1a9c030 Compare October 1, 2025 23:12
@Aklakan Aklakan force-pushed the 20250930_spatialindex-replace branch from 1a9c030 to 82ef78d Compare October 2, 2025 14:20
@Aklakan
Copy link
Contributor Author

Aklakan commented Oct 2, 2025

So benchmark results with this PR on a test dataset with 80K geometries are pretty much consistent with what was there before.

(p2_queryId)  (p3_inferences)  (p4_index)  (p5_jenaVersion)  Mode  Cnt  Score   Error  Units
          q1              off       false           current  avgt    5  2.460 ± 1.095   s/op
          q1              off       false             5.5.0  avgt    5  2.365 ± 0.259   s/op
          q1              off        true           current  avgt    5  0.250 ± 0.118   s/op
          q1              off        true             5.5.0  avgt    5  0.265 ± 0.035   s/op
          q1          virtual       false           current  avgt    5  2.947 ± 0.134   s/op
          q1          virtual       false             5.5.0  avgt    5  4.454 ± 0.820   s/op
          q1          virtual        true           current  avgt    5  0.310 ± 0.013   s/op
          q1          virtual        true             5.5.0  avgt    5  0.405 ± 0.012   s/op
          q1     materialized       false           current  avgt    5  2.168 ± 1.352   s/op
          q1     materialized       false             5.5.0  avgt    5  2.200 ± 1.186   s/op
          q1     materialized        true           current  avgt    5  0.149 ± 0.142   s/op
          q1     materialized        true             5.5.0  avgt    5  0.180 ± 0.129   s/op
          q2              off       false           current  avgt    5  0.079 ± 0.071   s/op
          q2              off       false             5.5.0  avgt    5  0.081 ± 0.012   s/op
          q2              off        true           current  avgt    5  0.070 ± 0.015   s/op
          q2              off        true             5.5.0  avgt    5  0.090 ± 0.007   s/op
          q2          virtual       false           current  avgt    5  0.158 ± 0.022   s/op
          q2          virtual       false             5.5.0  avgt    5  0.310 ± 0.021   s/op
          q2          virtual        true           current  avgt    5  0.159 ± 0.008   s/op
          q2          virtual        true             5.5.0  avgt    5  0.307 ± 0.010   s/op
          q2     materialized       false           current  avgt    5  0.047 ± 0.028   s/op
          q2     materialized       false             5.5.0  avgt    5  0.042 ± 0.007   s/op
          q2     materialized        true           current  avgt    5  0.069 ± 0.008   s/op
          q2     materialized        true             5.5.0  avgt    5  0.043 ± 0.004   s/op

A slight performance increase is seen on virtual inferences - most likely because because lookups by type (e.g. geo:Geometry) in addition would also have to scan all triples by which that type could be inferred. The new code just checks the properties directly.

          q1          virtual       false           current  avgt    5  2.947 ± 0.134   s/op
          q1          virtual       false             5.5.0  avgt    5  4.454 ± 0.820   s/op

The benchmark is compatible with jena-5.5.0 and thus adds type Geometry triples which with the now no core are longer needed. Plain geo:asWKT, geo:asGML or geo:hasSerialization is now sufficient.

@Aklakan Aklakan force-pushed the 20250930_spatialindex-replace branch 5 times, most recently from 7939dec to 45c7236 Compare October 2, 2025 15:04
@Aklakan Aklakan marked this pull request as ready for review October 2, 2025 15:05
@Aklakan
Copy link
Contributor Author

Aklakan commented Oct 2, 2025

The code itself ready for review. I will do a round of testing on our deployment for whether this reveals any issues and report back.

[Update] looked fine.

@Aklakan
Copy link
Contributor Author

Aklakan commented Oct 2, 2025

Hm, similar access patterns (e.g. the check for Geometry type triples) are used in SpatialIndexUtils, GenericSpatialPropertyFunction, SpatialObjectAccess. While at it, I'll consolidate this in one place.

@Aklakan Aklakan marked this pull request as draft October 2, 2025 16:47
@Aklakan Aklakan force-pushed the 20250930_spatialindex-replace branch 9 times, most recently from b4be7e7 to a3c54a3 Compare October 3, 2025 12:36
@Aklakan Aklakan marked this pull request as ready for review October 3, 2025 12:49
@Aklakan
Copy link
Contributor Author

Aklakan commented Oct 3, 2025

The code is now consolidated.

@Aklakan Aklakan changed the title GH-3473: Remove need for Geometry triples for GenericPropertyFunction. GH-3473: Identify spatial objects by properties instead of types. Oct 5, 2025
return new GraphRDFS(base, setup);
}

@Override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I read the code on main:

find() ->
find() in DatasetGraphWrapper ->
default method DatasetGraph.find() ->
find(Node.ANY, Node.ANY, Node.ANY, Node.ANY);

so this is not necessary.

And Node.ANY is better than null 😄

Copy link
Contributor Author

@Aklakan Aklakan Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default find() method of DatasetGraphWrapper goes to the delegated dataset which does not have the inferences - so DatasetGraphRDFS does have to override it to use findInf. I'll add a test case.

@Override
public Iterator<Quad> find()
{ return getR().find(); }

Node.ANY

Agreed.

Copy link
Contributor Author

@Aklakan Aklakan Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test case (fails without the fix) and switched to Node.ANY.

}
}

/** Atomic compute. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't I seen these Context changes in another of your PRs?

Copy link
Contributor Author

@Aklakan Aklakan Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the ExecTracker PR #3184 - but it also fits here where it is easier to review due to the size of the PR :)

@Aklakan Aklakan force-pushed the 20250930_spatialindex-replace branch 2 times, most recently from 12b04ae to 51f0600 Compare October 5, 2025 18:17
…PropertyFunction; Spatial Indexer UI: Replace without selection clears index.
@Aklakan Aklakan force-pushed the 20250930_spatialindex-replace branch from 51f0600 to 59897bf Compare October 5, 2025 18:30
@afs afs merged commit e4dcc30 into apache:main Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

jena-geosparql: issues when building spatial index from assembler

2 participants