Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split the bindings for TSQuery and TSQueryCursor into multiple classes #66

Open
Emmeral opened this issue Dec 20, 2024 · 4 comments · May be fixed by #79
Open

Split the bindings for TSQuery and TSQueryCursor into multiple classes #66

Emmeral opened this issue Dec 20, 2024 · 4 comments · May be fixed by #79
Labels
enhancement New feature or request

Comments

@Emmeral
Copy link
Contributor

Emmeral commented Dec 20, 2024

Currently the Query class maintains the state of both a TSQuery and TSQueryCursor (c.f. here). It has methods that work with both of them (i.e. here).

Splitting it into two separate classes has the following advantages:

  1. Split the logic of compiling the query and executing the query. So query compilation must be done less often.
  2. Currently the query class uses Arena.ofShared() but the tree-sitter documentation mentions that a TSQueryCursor shall only be accessed by a single thread. Allocating the cursor by an arena that can be accessed by multiple threads makes it more error prone.
  3. Consumers of the library can have access to the low level TSQueryCursor methods (e.g. ts_query_cursor_next_match). They must not use the Query::findMatches logic if they do not desire it.
  4. The default lifetime of the native memory of the query matches can be bound to the new QueryCursor class instead of the Query class. This way and application can keep the same Query instance for its entire runtime without leaking memory.

However the change would imply breaking API changes:

  • The methods of the Query class that currently set properties of a TSQueryCursor would need to be moved to the new QueryCursor class
  • The Query::findMatches methods would need an QueryCursor as an additional argument.
@ObserverOfTime
Copy link
Member

Merging them is required to support predicates. QueryCursor is not needed on its own in high-level bindings.

@Marcono1234
Copy link
Contributor

I also think that splitting the current Query class could be beneficial (if possible). Not necessary in classes whose API matches the C Query and QueryCursor, but conceptionally in two classes:

  • stateless Query1
  • stateful query result / result iterator

This is similar to how Java's Regex classes in the java.util.regex package work: A stateless Pattern and for each match operation a stateful Matcher.

Though I am not sure if that would be fully compatible with @Emmeral's use case.


So query compilation must be done less often.

This is probably not only a performance advantage, but also a correctness one. Users can create the Query once in advance and afterwards know that it is valid, instead of having to call Language#query(String) multiple times and every time having to account for the possibility that a QueryError might be thrown.


Merging them is required to support predicates.

Does that problem also apply when not strictly splitting into Query and QueryCursor but instead in the stateless Query and stateful query result iterator proposed above?

Footnotes

  1. It could probably still have state for configuration, e.g. disabling patterns, and even methods which in the C API belong to the QueryCursor, e.g. setMatchLimit? For those it could just store their values in fields in Query, and apply them when a new QueryCursor is created.
    But Query itself should not have any state which is related to any query execution / QueryCursor.

@Emmeral
Copy link
Contributor Author

Emmeral commented Jan 2, 2025

Merging them is required to support predicates.

This is not true if you expose the predicates of the query with a getter (either public or package private) and have a Query instance as a field of the QueryCursor class (similar to what is proposed by @Marcono1234 )

@Emmeral
Copy link
Contributor Author

Emmeral commented Jan 2, 2025

I did a quick implementation of the proposed ideas: master...Emmeral:java-tree-sitter:SplitQueryAndQueryCursor

Nothing production ready, but maybe helpful to understand the idea better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants