Changes to geometry validation strategy #525
Replies: 1 comment 1 reply
-
Hi, I have used simplefeatures in a very simple way, nothing too advanced, but I have encountered problems related to invalid geometries, mostly when trying to convert between formats (with geometries that are invalid and there is no easy way to enforce on the producer side of it). So I have used the But for me, there is cases (similar to the Drawback 4), where I'm just converting formats and I know that it's impossible to have invalid geometries (think, like little helper functions to convert from one struct to another, the geometry is guaranteed to be valid because it was validated during store/acquisition), and having to return an error on this functions tend to be a bit more sad, because most of the time the only part that can return an error inside this functions is the geometry validation. Now, about the proposed solutions, I really like the Create separate validating and non-validating variants of each operation I think is great. Like I said, most of the time I'm just getting data from one representation and creating another and sometimes the source representation already ensure that the data is valid, having the option to use variants that don't return errors can be very positive (low number of lines, small numbers of I don't have other comments for the other ideas, I have read it and it seems to be reasonable, but I don't have enough knowledge to give an opinion. NOTE: When I say "convert between formats", think [ Thanks for working on the library and this very detailed discussion. |
Beta Was this translation helpful? Give feedback.
-
This discussion is about how the
simplefeatures
library currently approaches geometry validation, the drawbacks of that approach, and proposes a new way to tackle geometry validation.It's not always desirable to validate geometries during construction
By default,
simplefeatures
validates geometries as part of construction. If validation fails, then instead of the geometry being returned, an error is instead.There are two situations where this behaviour is undesirable:
The geometry might be invalid, but the user wants to construct the geometry anyway. It's not ideal to have invalid geometries, but we don't live in an ideal world. Luckily, operations on invalid geometries still work reasonably well.
Geometry validation is not free from a computational perspective. If the geometry is known to be valid by construction, known to be valid a priori, or the user doesn't care if the geometry is valid or not, the validation step is wasteful. For some performance-critical applications, this may matter in a non-trivial way.
Current Geometry Validation Disablement Method
In the latest tagged version of
simplefeatures
(v0.44.0
), validation during construction can optionally be disabled each time a geometry is constructed. The mechanism used to allow validation to be disabled is the functional options pattern.Let's use the unmarshalling of a WKT as an example. The
simplefeatures
library provides a function to unmarshal WKTs:If no
ConstructorOption
s are passed in, then by default, the constructor will check the relevant geometry constraints for the geometry type before passing back the parsed geometry. If it's found invalid, an error is returned instead of the geometry value. However, if thegeom.DisableAllValidations
ConstructorOption
option is provided, then this validation step will be skipped, and the geometry always passed back, even if it's invalid.From a user's perspective, the invocation with (default) geometry validation looks like:
Invocation without geometry validation looks like:
Where can geometry constraint validation be disabled?
There are 5 different areas in
simplefeatures
where a user can disable validations:Direct geometry constructors. These are functions that create geometries directly from simpler values. Functions are
NewPoint
,NewLineString
,NewPolygon
,NewMultiPoint
,NewMultiLineString
,NewMultiPolygon
, andNewGeometryCollection
.Parsers. These are functions that parse strings or byte slices that represent serialised geometries. Functions include
UnmarshalWKT
,UnmarshalWKB
,UnmarshalTWKB
, andUnmarshalGeoJSON
.Transform methods. These are methods on geometry types that perform pointwise transformations. The method exists on all 7 concrete geometry types and on the
Geometry
sum-type and is always namedTransformXY
.Simplify methods. These methods on geometry types perform geometry simplification using the Ramer-Douglas-Peucker algorithm. The method names are always
Simplify
. Only theSimplify
methods forPolygon
,MultiPolygon
, andGeometry
acceptConstructorOption
s and return an error. The otherSimplify
methods don't acceptConstructorOption
s and don't return errors. This is because simplification cannot convert a valid geometry into an invalid geometry for other types.GEOS wrapper. The GEOS wrapper (in the
github.com/peterstace/simplefeatures/geos
package) uses WKB as the data transfer protocol between Go and C++ code. When results are received from GEOS as WKB, they are converted back tosimplefeatures
geometries via theUnmarshalWKB
function. The validation performed by default as part ofUnmarshalWKB
can be skipped.What are the drawbacks of the current approach?
There are several drawbacks to using functional options for disabling geometry validation. Most of these aren't problems inherent with functional options but are due to the pattern being used in inappropriate circumstances.
The drawbacks are in the following areas:
ConstructorOption
s are sometimes ignored.pkg.go.dev
.Drawback 1: Error handling mismatch
The signature of each direct geometry constructor returns both a geometry and an error. For example,
func NewPoint(c Coordinates, opts ...ConstructorOption) (Point, error)
.This is required because when no constructor options are passed in, the constructed geometry could be invalid, requiring an error to be passed back instead of the geometry. However, when
geom.DisableAllValidations
is passed in, the function can no longer return a non-nil error.A user has a choice to make about the error value returned:
They could return the error back up the stack. This isn't so bad if the function already returns an error (although it would lead to a tiny bit of code bloat). This isn't really feasible if the function calling the constructor doesn't return an error.
They could discard it (e.g. by assigning it to
_
). This feels unsatisfactory because it could lead to bugs, e.g., if someone later removedgeom.DisableAllValidations
from the invocation.They could check the error and
panic
if it's set. This is a more aggressive version of the previous option and is whatsimplefeatures
does internally. This can also be error-prone, and some Go programmers may strongly oppose this approach.None of these solutions are satisfying.
A similar issue exists with the
TransformXY
andSimplify
methods. These both return errors. If validation is disabled, then the error is alwaysnil
.Drawback 2: Constructor options are sometimes ignored
All direct constructors accept a variadic list of
ConstructorOption
s. This makes sense since a constructor option is a generic way to alter the behaviour of a constructor. Because the only constructor option isgeom.DisableAllValidations
, in practice, some direct constructors ignore the options passed it.For example,
NewMultiPoint
,NewMultiLineString
, andNewGeometryCollection
all ignore their constructor options arguments. This is becauseMultiPoint
,MultiLineString
, andGeometryCollection
are all unconstrained collections of other geometries. They don't introduce any constraints between their child geometries.This has caused some confusion in the past. For example, I sometimes see
geom.DisableAllValidations
being passed intoNewMultiLineString
, even though the presence or absence of this option has no impact on behaviour.Drawback 3: Confusing and inconsistent geometry validation
Some aspects of the current behaviour surrounding geometry validation in direct constructors can be confusing. I especially notice this when trying to document or explain the behaviour to others verbally.
In particular, it's confusing for collection geometry types (
MultiPoint
,MultiLineString
,MultiPolygon
, andGeometryCollection
). The direct constructors (New*
) all accept other geometries as inputs (Point
,LineString
,Polygon
, andGeometry
). However, they assume rather than check that the inputs are valid when the result is validated.This is confusing, because each of these collection geometry types has a validation rule that their child geometries must be valid. The justification for not validating the inputs is that they should have already been validated when they were constructed. It would be wasteful to validate a
Polygon
twice, so therefore, they are not validated again when constructed into collections.The situation is confusing because this only really applies to
MultiPolygons
. The other collection geometry types (MultiPoint
,MultiLineString
, andGeometryColelction
) don't introduce any constraints between their child geometries.This all has some potentially surprising consequences. For example, if you take two
Polygon
values, one valid and one invalid, and pass them intoNewMultiPolygon
(with validation enabled), then the resultantMultiPolygon
will be invalid. No error will be returned, so long as the interaction between the valid and invalid Polygons doesn't violate any of theMultiPolygon
constraints.Drawback 4: In recent versions of GEOS, invalid results are rare
Recent versions of GEOS use OverlayNG (NG stands for New Generation) for set operations between geometries (
Intersection
,Union
, etc.). With the introduction of OverlayNG, invalid geometries are no longer frequently produced by GEOS. They are either non-existent, or very rare.By default the
simplefeatuers
GEOS wrapper validates results from GEOS.This feels like a misalignment:
Validation isn't free, so validating GEOS results seems like the wrong default.
The GEOS wrapper is doing more than strictly wrapping the GEOS library (it is also validating the result). This blurs the lines between what
simplefeatures
is doing and what GEOS is doing.Drawback 5: Poorly linked documentation in
pkg.go.dev
.In the simplefeatures documentation at pkg.go.dev, there is poor linkage between the types and functions relating to disabling validation.
For example, say a user looks at the documentation for
NewPolygon
. They see that one of the arguments to that function is aConstructorOption
, which helpfully includes a link to its documentation. They're now at a dead-end. There is no way for the user to find all of the instances ofConstructorOption
, other than searching the entire page for something matching that type. The distance on the page betweenConstructorOption
andDisableAllValidations
is significant.As far as I can tell, this is part of the "price of admission" for using the functional options pattern in Go.
Drawback 6: There is only a single option
When
simplefeatures
was initially developed, it was predicted that there would be many different ways that the construction of geometries might need to be configured. This prediction was the reason that the functional options pattern was used. The idea was that new ways to configure geometry construction could be added in an easy-to-use and backwards-compatible way.The prediction that there would need to be more than one option turned out to be incorrect.
As it turned out, there is only a single boolean option: whether or not to validate the geometry after construction.
This isn't strictly speaking a drawback, as much as it is an indication that the functional options pattern is more powerful required.
What is the proposed new way?
Three general approaches solve the drawbacks of the current approach to various degrees:
Don't validate the result as the final step of operations that could result in invalid geometries. Introduce a
Validate() error
method, which users can call after if they wish. This is actually already implemented but not released in a tagged version yet.Create separate validating and non-validating variants of each operation. Because they are separate, the non-validating variants don't need to include an error in their return signature.
Continue using "constructor options", but tweak the mechanics of the set-up such that the drawbacks of functional options aren't applicable anymore.
Option 1 is the approach that GEOS takes, and introduces a new drawback. It suddenly becomes very easy for users to forget to validate geometries. Users new to GIS may not even be aware that geometry validation is a thing they need to think about.
One of the core design principles I had in mind when initially developing
simplefeatures
is that geometries should be validated by default. Disabling geometry validation should be opt-out instead of enabling geometry validation being opt-in. That's still a good design principle, but perhaps applied too universally.The following proposal fixes the five drawbacks outlined earlier by using a mix of the three approaches outlined above:
Introduce a
Validate() error
function. This will allow users to check the validity of a geometry after it has been constructed in a non-validating way.Modify direct constructors (
NewPoint
etc.) to no longer perform validation at all. The direct constructors are for more advanced use cases where geometries are being constructed from other geometries or raw sequences of floats. Things like custom geometric algorithms come to mind. Because these are for advanced use cases, making users validate manually is reasonable.Modifying direct constructors largely solves drawback 1 (error handling mismatch), drawback 2 (constructor options sometimes ignored), and drawback 3 (confusing/inconsistent geometry validations)
Tweak the functional options pattern for parsers. Modify each parse function (
UnmarshalWKT
, etc.) to accept a variadic list ofNoValidation
values instead of a variadic list of functional options.NoValidation
would be defined as simplystruct{}
. This solves drawback 5 by causing documentation to link together much more nicely. It will make it easier for users to understand how to disable validation.Tweak the functional options pattern for
Simplify
. Simplifying Polygons and MultiPolygons can sometimes introduce ring self-intersections and invalid interactions between child polygons. This fact is not immediately obvious, so checking validity by default is important. It's sometimes desirable to disable validation, though. For example, the validity could be checked manually afterwards, and if found to be invalid,MakeValid
used to "fix" the result. Using the sameNoValidation
option approach as parsers solves drawback 5 by causing documentation to link together more nicely. It doesn't solve drawback 1 (error handling mismatch), but I think that's okay given the rarity of wanting to simplify without validating the result. The alternatives (e.g. introducing validating and non-validating variants of the methods) introduce other problems, such as inconsistencies between geometry types (simplification can cause validation errors on some geometry types but not others).Remove validation from
TransformXY
.TransformXY
transforms the coordinates of geometries in arbitrary ways via a user-supplied mapping. It should be obvious that this potentially could cause invalid results. For example, the mapping could trivially map every coordinate pair of a polygon to the same point. In practice, transformations are either affine (translation, rotation, scaling), or, at the very least, topologically preserving (e.g. WGS84 to a Mercator or Conic projection). Removing validation, adding a warning in documentation, and allowing a user to validate manually afterwards help solve drawback 1 (error handling mismatch).Don't validate at all in the GEOS wrapper. Users can still validate manually. The new OverlayNG in GEOS is very robust, and all things being equal, it's better for a C wrapper to be doing as little as possible (i.e. just wrapping) so that it's clearer to users what is actually happening. Users can still validate the result manually with the
Validate() error
method if they wish to. This addresses drawback 4 (invalid GEOS results are rare).I'm looking for feedback on the proposal, and will likely start implementing these changes in the next few weeks if there aren't any major objections.
Beta Was this translation helpful? Give feedback.
All reactions