Skip to content

perf: optimize sql for or queries #948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TheBobBobs
Copy link

Summary

Currently there is one check for each term in an OrList. This change will extract the tags out of the OrList and put them in one check which is then combined with the remaining terms in the OrList.

Parent tags that expand to many children tags also benefit from this optimization.

Tasks Completed

  • Platforms Tested:
    • Windows x86
    • Windows ARM
    • macOS x86
    • macOS ARM
    • Linux x86
    • Linux ARM
  • Tested For:
    • Basic functionality
    • PyInstaller executable

@CyanVoxel CyanVoxel added TagStudio: Search The TagStudio search engine Type: Performance An issue or change related to performance labels Jun 5, 2025
@CyanVoxel CyanVoxel added the Status: Review Needed A review of this is needed label Jun 5, 2025
@CyanVoxel CyanVoxel moved this to 🏓 Ready for Review in TagStudio Development Jun 5, 2025
Copy link
Collaborator

@Computerdores Computerdores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me and runs on this library in ~13s instead of ~27s.

Comment on lines -91 to -96
# If there is just one tag id, check the normal way
elif len(tag_ids) == 1:
bool_expressions.append(
self.__entry_satisfies_expression(TagEntry.tag_id == tag_ids[0])
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing using __entry_has_all_tags was slightly faster with one tag id. Sqlite will optimize out the group_by when just one tag_id is present. this can be checked with the query plan

rows = session.execute(text(f"EXPLAIN QUERY PLAN {query_full}")).fetchall()
for row in rows:
    print(row)

@Computerdores Computerdores removed the Status: Review Needed A review of this is needed label Jun 5, 2025
@Computerdores Computerdores moved this from 🏓 Ready for Review to 🍃 Pending Merge in TagStudio Development Jun 5, 2025
@TheBobBobs
Copy link
Author

Works for me and runs on this library in ~13s instead of ~27s.

This query takes 3 minutes for me on main.
Have you by chance created a custom index on tag_entries with columns (entry_id, tag_id)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TagStudio: Search The TagStudio search engine Type: Performance An issue or change related to performance
Projects
Status: 🍃 Pending Merge
Development

Successfully merging this pull request may close these issues.

3 participants