Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure and improve the constant numbers used when building the tree #35

Open
Kerollmops opened this issue Dec 4, 2023 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@Kerollmops
Copy link
Member

Kerollmops commented Dec 4, 2023

We must take three parameters into account:

  1. Time to build the tree
  2. Relevancy of the searches
  3. Time to search in the tree

Fun fact: the lowest in the tree you are, the less impact a dummy plane has on the search cost.


arroy/src/writer.rs

Lines 248 to 259 in 7fc6031

if split_imbalance(children_left.len(), children_right.len()) < 0.95
|| remaining_attempts == 0
{
break normal;
}
remaining_attempts -= 1;
};
// If we didn't find a hyperplane, just randomize sides as a last option
// and set the split plane to zero as a dummy plane.
while split_imbalance(children_left.len(), children_right.len()) > 0.99 {

fn split_imbalance(left_indices_len: usize, right_indices_len: usize) -> f64 {
    let ls = left_indices_len as f64;
    let rs = right_indices_len as f64;
    let f = ls / (ls + rs + f64::EPSILON); // Avoid 0/0
    f.max(1.0 - f)
}

fn main() {
    dbg!(split_imbalance(29464, 18394));
    dbg!(split_imbalance(30000, 30000));
    dbg!(split_imbalance(30000, 1580));
}
@Kerollmops Kerollmops added the enhancement New feature or request label Dec 4, 2023
@Kerollmops Kerollmops changed the title Measure and improve the constant numbers using when building the tree Measure and improve the constant numbers used when building the tree Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant