Skip to content

Conversation

mattjala
Copy link
Contributor

@mattjala mattjala commented Sep 10, 2025

Implement an r-tree data structure in a new module. It has three exposed methods: creation, destruction, and search.

The STR algorithm used during creation is based on the one described here.

This will be used to optimize VDS operations in an upcoming PR.


Important

Introduces an R-tree data structure with creation, search, and destruction methods, integrated into the build system and validated with new tests.

  • R-tree Implementation:
    • Adds new R-tree data structure in H5RT.c with methods H5RT_create(), H5RT_search(), and H5RT_free().
    • Implements STR algorithm for efficient R-tree packing.
  • CMake Integration:
    • Updates CMakeLists.txt to include H5RT.c and related headers.
  • Testing:
    • Adds rtree.c test file to validate R-tree creation and search functionalities.
    • Updates test/CMakeLists.txt to include rtree in test suite.

This description was created by Ellipsis for 1b8ba58. You can customize this summary. It will automatically update as commits are pushed.

byrnHDF
byrnHDF previously approved these changes Sep 22, 2025
Copy link
Contributor

@byrnHDF byrnHDF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake changes looks good.

/* Sort hyper-rectangles in this region by the first unsorted coordinate of their midpoints */
if (!this_rank_sorted) {
assert(prev_sort_dim < rank - 1);
sort_dim = prev_sort_dim + 1;
Copy link
Member

@fortnern fortnern Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it just sorts dimensions in the order 0,1,2, 3,3,3,3 (for rank 4)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be hard to use something simple like picking the largest dimension? I'm wondering how much this would help performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is correct - it just iterates through the dimensions. We could try picking the largest dimension, though this would require manually keeping tracking of which dimensions have been sorted and which haven't. I'll try it and see if it has an impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by largest dimension? I tried an implementation where the dimension in the bbox with the highest span (max - min) was prioritized, but didn't find any improvement in performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - C Library Core C library issues (usually in the src directory)
Projects
Status: Scheduled/On-Deck
Development

Successfully merging this pull request may close these issues.

4 participants