vectorize bounding box query #699

giovp · 2024-09-02T20:39:57Z

Trying to vectorize the tiling in the dataloader PR, realized some improvements should be added separately.

With this PR, I enable the vectorization of bounding_box_query for all elements.

This means that it is now possible to pass an array of bounding boxes (and not just one).

TODO:

tests for multiple bounding box queries for raster data
tests for multiple bounding box queries for shapes
tests for multiple bounding box queries for points

@LucaMarconato @melonora do you have any suggestion of when this could be used to replace current implementations across the projects? A clear use case (and the motivation for this contribution) is the dataloader, see #687 for the speedup, but I wonder if there are other places where this is useful.

This also is the groundwork for eventual update to the dataloader, being able to return batches of all elements.

considering that this PR is already getting too large, I would postpone the vectorization of polygon query.

codecov · 2024-09-02T21:03:50Z

Codecov Report

Attention: Patch coverage is 91.75258% with 16 lines in your changes missing coverage. Please review.

Project coverage is 91.76%. Comparing base (774b492) to head (b8b6331).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/spatialdata/_core/query/_utils.py	82.05%	14 Missing ⚠️
src/spatialdata/_core/query/spatial_query.py	98.14%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #699      +/-   ##
==========================================
- Coverage   91.83%   91.76%   -0.07%     
==========================================
  Files          44       45       +1     
  Lines        6781     6887     +106     
==========================================
+ Hits         6227     6320      +93     
- Misses        554      567      +13

Files with missing lines	Coverage Δ
src/spatialdata/_docs.py	`100.00% <100.00%> (ø)`
src/spatialdata/_core/query/spatial_query.py	`95.21% <98.14%> (+0.47%)`	⬆️
src/spatialdata/_core/query/_utils.py	`84.37% <82.05%> (-11.28%)`	⬇️

... and 1 file with indirect coverage changes

giovp · 2024-09-03T00:25:26Z

ready for a first pass of review

giovp · 2024-09-04T19:54:48Z

ready for review

LucaMarconato · 2024-09-15T15:23:25Z

Thank you @giovp for the PR, I will review now.

A clear use case (and the motivation for this contribution) is the dataloader, see #687 for the speedup, but I wonder if there are other places where this is useful.

I can't think of other places right now, so I think we are good with the current improvement.

considering that this PR is already getting too large, I would postpone the vectorization of polygon query.

I agree.

LucaMarconato · 2024-09-15T17:10:32Z

src/spatialdata/_core/query/_utils.py

+
+@nb.njit(parallel=False, nopython=True)
+def _create_slices_and_translation(
+    min_values: nb.types.Array[nb.float64, nb.float64],


I think nb.types.Array[nb.float64, np.float64] may be incorrect and that the correct version is nb.types.Array(nb.float64, 2, 'C'). But I am not sure because pre-commit doesn't complain, so maybe both syntaxes are allowed.

Mine is wrong. Are you sure about your typing? What looks strange to me is that types like nb.types.Array[nb.float64, nb.int64] would not have a meaning. I tried using just nb.types.Array and pre-commit works, maybe this is the way to go.

I will look into it!

I removed the dtype, I'd merge.

LucaMarconato · 2024-09-26T16:13:42Z

@giovp I finished reviewing; I applied some minor changes like simplified the docs and added an extra test.

giovp added 3 commits September 2, 2024 11:58

vectorize adjust_bounding_box_to_real_axes

92d578f

update

2bb5c35

replace append with insert

c89dcdf

giovp added 5 commits September 2, 2024 14:10

add comment

5bf0b43

vectorize

a60bf6f

update to handle multiple boxes

017967b

vectorize with numba

ab774b7

fix corner len

38dba25

giovp mentioned this pull request Sep 3, 2024

improves dataloader performance #687

Open

fix validation

a934e21

giovp marked this pull request as ready for review September 3, 2024 00:25

giovp added 7 commits September 3, 2024 14:26

refactor

77f73f4

refactor

3adfea8

add test for query with multiple bounding boxes

dfdfdbf

fix typing

5c5560d

vectorize bounding box query on polygons

dd2c573

add test to cover no polygon overlap (None)

be95358

vectorize bounding box query on points and tests

fad9b1a

giovp changed the title ~~few improvements to transformations~~ vectorize bounding box query Sep 4, 2024

fix type

9b977d6

giovp requested review from LucaMarconato and melonora September 4, 2024 19:54

LucaMarconato reviewed Sep 15, 2024

View reviewed changes

LucaMarconato added 2 commits September 24, 2024 19:35

wip fixes code review

20fe261

added extra test; finished applying code review changes

b8b6331

LucaMarconato enabled auto-merge (squash) September 26, 2024 16:13

LucaMarconato merged commit 8239455 into main Sep 26, 2024
8 checks passed

LucaMarconato deleted the giovp/parallel-transform branch September 26, 2024 16:30

LucaMarconato mentioned this pull request Oct 3, 2024

Export the real global coordinates of cells (Xenium) scverse/spatialdata-io#205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vectorize bounding box query #699

vectorize bounding box query #699

giovp commented Sep 2, 2024 •

edited

Loading

codecov bot commented Sep 2, 2024 •

edited

Loading

giovp commented Sep 3, 2024 •

edited

Loading

giovp commented Sep 4, 2024

LucaMarconato commented Sep 15, 2024

LucaMarconato Sep 15, 2024 •

edited

Loading

LucaMarconato Sep 15, 2024

giovp Sep 16, 2024

LucaMarconato Sep 26, 2024 •

edited

Loading

LucaMarconato commented Sep 26, 2024

vectorize bounding box query #699

vectorize bounding box query #699

Conversation

giovp commented Sep 2, 2024 • edited Loading

codecov bot commented Sep 2, 2024 • edited Loading

Codecov Report

giovp commented Sep 3, 2024 • edited Loading

giovp commented Sep 4, 2024

LucaMarconato commented Sep 15, 2024

LucaMarconato Sep 15, 2024 • edited Loading

Choose a reason for hiding this comment

LucaMarconato Sep 15, 2024

Choose a reason for hiding this comment

giovp Sep 16, 2024

Choose a reason for hiding this comment

LucaMarconato Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

LucaMarconato commented Sep 26, 2024

giovp commented Sep 2, 2024 •

edited

Loading

codecov bot commented Sep 2, 2024 •

edited

Loading

giovp commented Sep 3, 2024 •

edited

Loading

LucaMarconato Sep 15, 2024 •

edited

Loading

LucaMarconato Sep 26, 2024 •

edited

Loading