Skip to content

Add method to take clauses out of boolean query #2601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

umweltinfo
Copy link

This helps with rewriting, for example to replace the default score combiner used by boolean queries, insofer it avoids the need to box_clone the sub queries.

For us, this replaces #2600 insofar we will keep carrying a CustomBooleanQuery to overwrite the default score combiner.

@@ -257,6 +259,11 @@ impl BooleanQuery {
pub fn clauses(&self) -> &[(Occur, Box<dyn Query>)] {
&self.subqueries[..]
}

/// Take the clauses making up this query for reuse.
pub fn take_clauses(&mut self) -> Vec<(Occur, Box<dyn Query>)> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make that an Into<Vec<(Occur, Box<dyn Query>)>> implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly. impl Iterator<Item=(Occur, Box<dyn Query>) might be even more future-proof though?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add such a method and also generalized the return type of the existing method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the formatting SNAFU. It said something about nightly and then reformatted everything when I tried cargo fmt, so I tried to adjust things manually to match the CI job.

Copy link
Collaborator

@fulmicoton fulmicoton Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add such a method and also generalized the return type of the existing method.

This is not doing what I asked for though.
Why

pub fn take_clauses(&mut self) -> impl Iterator<Item = (Occur, Box<dyn Query>)> {

And not

impl Into<Vec<..>> 

if the argument is about not making it Vec<> why fn take_clauses(&mut self)?
Why not fn into_clauses(self)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not doing what I asked for though.

So should I revert this part then?

And not impl Into<Vec<..>>

I understood that you wanted to isolate the signature from the internal representation as a Vec. Into<Vec> is a limited form of isolated as it requires a materialized data structure to exist, hence the choice of an iterator which could be built on-the-fly if it ever becomes necessary.

For me personally, -> impl Into<Vec<..>> does certainly suffice and I can change if you prefer that?

Why not fn into_clauses(self)?

We use to fix up queries after the query parser has build them up, i.e. we have boxed trait objects and need to downcast to access the clauses. Hence, I don't think I can move out of this trait objects. Even if I can, I would have allocate a placeholder query object to take its place while I create a new query. With the take(&mut self) approach, the emptied query basically is that placeholder and is already in place.

For completeness, the code I would like to replace using this is:

if let Some(boolean_query) = query.downcast_ref::<BooleanQuery>() {
    let clauses = boolean_query
        .clauses()
        .iter()
        .map(|(occur, clause)| (*occur, clause.box_clone()))
        .collect::<Vec<_>>();

    query = Box::new(CustomBooleanQuery::new(clauses));
}

which should become something like

if let Some(boolean_query) = query.downcast_mut::<BooleanQuery>() {
    let clauses = boolean_query
        .take_clauses()
        .collect::<Vec<_>>();

    query = Box::new(CustomBooleanQuery::new(clauses));
}

(AFAIK, some_vec.into_iter().collect::<Vec<_>>() will not actually allocate as the Vec is reused, so impl Iterator does cost more than impl Into.)

This helps with rewriting, for example to replace the default
score combiner used by boolean queries, insofer it avoids
the need to `box_clone` the sub queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants