Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial join area #1713

Open
wants to merge 50 commits into
base: master
Choose a base branch
from

Conversation

Jonathan24680
Copy link
Collaborator

@Jonathan24680 Jonathan24680 commented Jan 17, 2025

This merge request adds the possibility to use arbitrary WKT Geometries (i.e. areas) in SpatialJoin queries. Right now, areas can get approximated as the center point or the true distance between them is estimated. In addition to that a future PR will add pre computed bounding boxes to the index generation to enable even faster query times (like it already exists for geopoints)

@Jonathan24680 Jonathan24680 requested a review from joka921 January 17, 2025 10:00
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 94.35028% with 10 lines in your changes missing coverage. Please review.

Project coverage is 90.08%. Comparing base (cfc49a9) to head (e5121f6).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/SpatialJoinAlgorithms.cpp 93.67% 5 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1713      +/-   ##
==========================================
+ Coverage   90.03%   90.08%   +0.04%     
==========================================
  Files         396      396              
  Lines       37968    38148     +180     
  Branches     4261     4279      +18     
==========================================
+ Hits        34186    34364     +178     
+ Misses       2492     2491       -1     
- Partials     1290     1293       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial general review of everything but the tests.
When you have addressed the initial principle suggestions I will have a closer look

Comment on lines 83 to 85
} catch (...) {
return std::nullopt;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does boost geometry have a mode or overload of readWkt that doesn't use exceptions? then use that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. At least I'm not aware of such a function or overload and could not find one.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pass on SpatialJoinAlgoirthms.cpp (where most of the code lies anyway) :)

// ____________________________________________________________________________
double SpatialJoinAlgorithms::convertDegreesToMeters(
AnyGeometry geometry1, AnyGeometry geometry2) const {
return boost::apply_visitor(DistanceVisitor(), geometry1, geometry2) * 78.630;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the multiplication doing here, the magic constant should be explained.
Is the function name actually computeDistanceInMeters (you first compute it, and then do some magic multiplication to convert)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a very verbose explanation of this is in the header file. But you're right, the function should be called different, as it first gets the distance in degrees and then converts it. I called it computeDist, since that's what the other compute distance functions are called. Then it's consistent and depending on the parameters available, one can call one or the other computeDist function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification, I am much less confused now:)

Comment on lines 134 to 138
AnyGeometry geom1 = geo1.geometry.value();
AnyGeometry geom2 = geo2.geometry.value();
if (geo1.geoPoint.has_value()) {
geom1 = convertGeoPointToPoint(geo1.geoPoint.value());
} else if (geo2.geoPoint.has_value()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is stored e.g. in geo1.geometry.value() if geo1 points to a POINT already? This last part looks a bit confusing.

Copy link
Collaborator Author

@Jonathan24680 Jonathan24680 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one line above, there is a comment, which states "one point and one area". So this is the case, where we have one GeoPoint and one area. Then geom1 and geom2 get initialized to be the geometries (which is wrong in one case. When the geometry is a GeoPoint, then geomX.geometry is std::nullopt). The if below checks, which of those two is not a geometry but a geopoint. Then it converts geopoint to boost point (as boost requires both types to be boost geometries). You're right, that they should not be initialized with .value, since that is std::nullopt for at least one of them. I still have to write a test, which checks that, but it should work now

boost::geometry::read_wkt(getAreaString(idtable, row, col), geometry);
} else {
geometry = Point(point.value().getLng(), point.value().getLat());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier control flow:
if (point.has_value) { return point; } else { return getAnyGeometry(...)} Everything else then is not nested in the if`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'(with my suggestion for the AnyGeometry)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand what you mean

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (point.has_value()) {
  return std::optional{AnyGeometry{convertPoint...()}};
}
return std::optional{getAnyGeometry(...);

You can drop the std::optional if you explicitly set the return type of the lambda to std::optional<AnyGeometry>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course only relevant, if we still have this code at all.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of reviews.

// ____________________________________________________________________________
double SpatialJoinAlgorithms::convertDegreesToMeters(
AnyGeometry geometry1, AnyGeometry geometry2) const {
return boost::apply_visitor(DistanceVisitor(), geometry1, geometry2) * 78.630;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification, I am much less confused now:)

boost::geometry::read_wkt(getAreaString(idtable, row, col), geometry);
} else {
geometry = Point(point.value().getLng(), point.value().getLat());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (point.has_value()) {
  return std::optional{AnyGeometry{convertPoint...()}};
}
return std::optional{getAnyGeometry(...);

You can drop the std::optional if you explicitly set the return type of the lambda to std::optional<AnyGeometry>

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another pass

Comment on lines 114 to 116
const auto& areaBox = bbox.value();
Point p = calculateMidpointOfBox(areaBox);
return GeoPoint(p.get<1>(), p.get<0>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think below the code gets much more readable if you do
if (!useMidpoint) { return computeDist(index1, index2);}; //.
Then it becomes much clearer that the whole complexity is just for the midpoint case (is this actually useful still?)
If you want to keep the midpoint algorithm then you can move the code into a separate function computeDistUsingMidpoints and then the actual computeDist becomes very short and readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't get what you mean. The if else code blocks in the current implementation are equally large and complex for both cases.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional changes.

@sparql-conformance
Copy link

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants