perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader #171

petern48 · 2025-10-01T01:50:47Z

This PR leverages the new WKBBytesExecutor for dimension calculation, so we can implement functions like st_hasz and st_hasm without parsing the entire geometry. The logic turns out to be more complicated than I originally expected (due to edge cases relating to inferring the dimensionality).

To properly get the dimensionality, we need to OR all of the following (short-circuiting permitted, of course):

the dimensionality of the geometry type (the obvious one)
- e.g POINT Z EMPTY -> xyz
the dimensionality of the first nested geometry (if it's some sort of collection)
- e.g GEOMETRYCOLLECTION (POINT Z (0 0 0)) -> xyz

closes issue #170

Benchmark results (this was before implementing the full WKBHeader), so it's likely faster than it is when it merged:

…dge case)

rust/sedona-functions/src/st_haszm.rs

petern48 · 2025-10-01T02:05:01Z

I'd also want to convert that function to one that returns the dimensionality (e.g xy, xyz, etc) and then use that to implement st_haszm, in case that logic can be reused elsewhere.

paleolimbot

Cool! In general I think this is a great idea (lazy parsing just the header when that's all we need).

I left a suggestion about consolidating some of the first-few-bytes parsing we're doing so that we have a place to test it better.

python/sedonadb/tests/functions/test_functions.py

rust/sedona-functions/src/st_haszm.rs

…ture as test

…on fields

petern48 · 2025-10-04T06:12:40Z

Added perf benchmarks to the PR description 🤠

petern48 · 2025-10-04T16:20:00Z

I can't import sedona-testing into sedona-geometry due to a circular dependency, and hence can't import the fixture into the wkb_header.rs to be tested. It is at least being tested in st_has_(z/m), so I'm confident the logic is right. Let me know if you'd rather move something around or copy-paste the test over, otherwise, this is how I'll leave it.

The unparseable WKT strings are still left in the code as comments at the moment, though I did also mention them in #162 as a separate reminder if / whenever that's fixed. Personally, I prefer to leave the comments in the code as an additional reminder, but if you'd rather have me delete them. Let me know.

paleolimbot

This is going to be so cool! I left some suggestions about reorganizing the WkbHeader to support a few of the other things I'd like to do with it 🙂

python/sedonadb/tests/functions/test_functions.py

rust/sedona-testing/src/fixtures.rs

rust/sedona-geometry/src/wkb_header.rs

paleolimbot · 2025-10-05T03:02:39Z

rust/sedona-geometry/src/wkb_header.rs

+    match code / 1000 {
+        // If xy, it's possible we need to infer the dimension
+        0 => {}
+        1 => return Ok(Dimensions::Xyz),
+        2 => return Ok(Dimensions::Xym),
+        3 => return Ok(Dimensions::Xyzm),
+        _ => return sedona_internal_err!("Unexpected code: {code}"),
+    };


This should also handle EWKB high bit flags. Most of the time this will be ISO WKB from GeoParquet but not all tools have control over the type of WKB they generate and we're better for dealing with it (unless you can demonstrate measurable performance overhead, which I doubt is the case here). One notable data point is that WKB coming from Sedona Spark's dataframe_to_arrow() is EWKB.

rust/sedona-geometry/src/wkb_header.rs

rust/sedona-functions/src/st_haszm.rs

Co-authored-by: Dewey Dunnington <[email protected]>

…rst_xy_coord

…ng state

petern48 · 2025-10-25T05:49:58Z

@paleolimbot WDYT about adding geos as a test dependency to avoid having to hard-code so many fixtures? Reasonable or nah? Thinking about usefulness for future PRs too. We could write EWKB similar to how this PR in WKB used it for testing https://github.com/georust/wkb/pull/46/files#diff-d5cbc1df3ceaaa6b6d928a7d04e566e05caaa85fa2eb79665fcf3d43d01b7a19

If not, then I'll proceed with hard-coding fixtures.

paleolimbot

Thank you for bearing with me on this! This will be useful for a lot of cheap/structural inspections of geometries. All the comments except the commented-out test are optional 🙂

WDYT about adding geos as a test dependency to avoid having to hard-code so many fixtures?

I'd like to avoid geos as a test dependency for now (we can revisit if our fixture list gets out of control). In general being able to run tests without any system dependencies is helpful for contributors.

paleolimbot · 2025-10-28T14:08:34Z

rust/sedona-geometry/src/wkb_header.rs

+    // #[test]
+    // fn geometrycollection_with_srid() {
+    //     use sedona_testing::fixtures::*;


Can this be uncommented?

I actually meant to delete this, which I've now done. It's redundant. There's already GeometryCollection with SRID test cases elsewhere.

rust/sedona-geometry/src/wkb_header.rs

paleolimbot · 2025-10-28T14:37:18Z

rust/sedona-geometry/src/wkb_header.rs

+        let buf = &self.buf;
+        let off = self.offset;
+        let coord: f64 = match self.last_endian {
+            0 => f64::from_be_bytes([


This is great for this PR (where we only ever read two ordinates)...if we were to expand this we'd want to move this match outside the loop (i.e., so we only check the endian and buffer size once per coordinate sequence)

I see what you mean, but I think it would look / feel weird to pull it out prematurely. The loop doesn't exist yet (I assume you're talking about a loop iterating over the coords, bc I'm not seeing any loop in the existing code). If I'm understanding you right. It wouldn't make a difference now performance-wise wise since we're only reading one xy coord. I'd rather leave it like this for now, and pull it out if / when we read more coords.

rust/sedona-geometry/src/wkb_header.rs

Co-authored-by: Dewey Dunnington <[email protected]>

…to st_haszm_wkb_bytes

…st test

paleolimbot

Thank you! Excited to see more optimized kernels!

rust/sedona-functions/src/st_haszm.rs

Co-authored-by: Dewey Dunnington <[email protected]>

petern48 · 2025-10-30T05:21:41Z

FYI, our performance compared to duckdb is not much farther ahead as shown in the original benchmark in the PR description. Here's the updated benchmark.

Main differences are:

This updated benchmark uses the single-threaded Sedona. I'm guessing this is the major cause of the perf drop.
We've implemented the full WkbHeader since the original benchmark. We do more than the original implementations did, so it should be slower. Mainly, we support EWKB and SRID checks, and we also grab first_xy. first_xy shouldn't be much more effort since we already were getting to first_geom for the first_geom_dimensions field. I've addressed all main slowdowns I was aware of (e.g. creating nested buffers), but I wonder if there are more.

petern48 added 4 commits September 30, 2025 18:33

Implement st_haszm using WKBBytesExecutor instead (missing one last e…

ccae8ff

…dge case)

Add note about handling last edge case

07aa206

Minor fix to the comments

cf031fb

Fix pre-commit

526fc3b

petern48 commented Oct 1, 2025

View reviewed changes

rust/sedona-functions/src/st_haszm.rs Outdated Show resolved Hide resolved

paleolimbot reviewed Oct 1, 2025

View reviewed changes

petern48 added 9 commits October 2, 2025 08:55

Fix cargo clippy

a491d91

Save progress

cf90bcf

Pull out dimension calculation logic into new wkb_header.rs

22a6087

Add MULTIPOINT_WITH_INFERRED_Z_DIMENSION_WKB fixture

5c616af

Fix dimension calculation to support all collection types and add fix…

207ecb1

…ture as test

Fix clippy and clean up

43009f8

Remove public byte_order method since it's not needed atm

1078bdd

Perform all wkb_header operations lazily and cache the values as Opti…

4d4e7e0

…on fields

Add python integration test benches

dfd6c1a

Add tests for wkb_header

0ef812d

petern48 marked this pull request as ready for review October 4, 2025 16:33

petern48 requested a review from paleolimbot October 4, 2025 16:33

paleolimbot reviewed Oct 5, 2025

View reviewed changes

petern48 and others added 5 commits October 4, 2025 21:04

Apply suggestion from @paleolimbot

075d6e6

Co-authored-by: Dewey Dunnington <[email protected]>

Remaining clean up

491b3c7

Update to method to dimensions plural

7efccc0

Rename method to try_new

06501e5

Update fixture to be multipoint ((1 2 3)) instead of all zeros

1b397fd

petern48 marked this pull request as draft October 5, 2025 22:51

jiayuasu mentioned this pull request Oct 6, 2025

feat(sql): Implement ST_Azimuth() #183

Merged

Implement refactor

9ce9f08

petern48 added 6 commits October 22, 2025 20:31

Fix write_geometry arg after merge

597c22e

Create and use read_u32() helper function

55b94ef

Move all functions into WKbHeader as methods also rename helper to fi…

b77a762

…rst_xy_coord

Catch the error instead of hiding it in first_geom_idx

a2218a9

Use new WkbBuffer that calculates values by consuming bytes and keepi…

b8f1cc3

…ng state

Fix st_haszm to map sedona errors

4e3e525

petern48 added 5 commits October 27, 2025 23:59

Fix bug, so dimensions supports EWKB and ISO WKB

92dd670

Clean up

e4d269a

Remove parse_dimensions function and rename to read_coord

c25ac66

Create set_offset() function to avoid creating new WkbBuffers

86aeb3c

Add EWKB GEOMETRYCOLLECTION w/ Z, M tests

92759f2

petern48 marked this pull request as ready for review October 28, 2025 07:36

petern48 requested a review from paleolimbot October 28, 2025 07:37

paleolimbot reviewed Oct 28, 2025

View reviewed changes

petern48 and others added 7 commits October 28, 2025 08:01

Delete unnecessary commented test + small match refactor

389c34f

Apply suggestion from @paleolimbot

426a17a

Co-authored-by: Dewey Dunnington <[email protected]>

Apply suggestion from @paleolimbot

a6902f8

Co-authored-by: Dewey Dunnington <[email protected]>

Make incomplete_buffer tests more concise and comprehensive

4a8c070

Merge branch 'st_haszm_wkb_bytes' of github.com:petern48/sedona-db in…

99aeb4b

…to st_haszm_wkb_bytes

Avoid final nested WkbBuffer::try_new() call to improve performance

5d413de

Fix bug with geom collection nested inside geom collection and add ru…

2d2c8e5

…st test

petern48 requested a review from paleolimbot October 29, 2025 06:42

paleolimbot approved these changes Oct 29, 2025

View reviewed changes

rust/sedona-functions/src/st_haszm.rs Outdated Show resolved Hide resolved

cleanup rust/sedona-functions/src/st_haszm.rs

9e10d6b

Co-authored-by: Dewey Dunnington <[email protected]>

petern48 requested a review from paleolimbot October 29, 2025 15:20

paleolimbot merged commit 4523cd4 into apache:main Oct 29, 2025
12 checks passed

petern48 deleted the st_haszm_wkb_bytes branch October 30, 2025 05:06

petern48 mentioned this pull request Oct 30, 2025

feat(rust/sedona-functions): Implement native ST_ZMFlag using WKBHeader #260

Merged

perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader #171

perf: Optimize st_has(z/m) using WKBBytesExecutor + Implement new WKBHeader #171

Uh oh!

Conversation

petern48 commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

petern48 commented Oct 1, 2025

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

petern48 commented Oct 4, 2025

Uh oh!

petern48 commented Oct 4, 2025

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paleolimbot Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

petern48 commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

paleolimbot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

petern48 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paleolimbot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

petern48 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

petern48 commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

petern48 commented Oct 1, 2025 •

edited

Loading

petern48 commented Oct 25, 2025 •

edited

Loading