Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not compute global text&tags twice on reindex #2761

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions nucliadb/src/nucliadb/ingest/orm/processor/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,14 +313,19 @@ async def txn(
await self.apply_resource(message, resource, update=(not created))

# index message

if resource:
await resource.compute_global_text()
await resource.compute_global_tags(resource.indexer)
await resource.compute_security(resource.indexer)
if message.reindex:
# when reindexing, let's just generate full new index message
# TODO - This should be improved in the future as it's not optimal for very large resources:
# As of now, there are some API operations that require fully reindexing all the fields of a resource.
# An example of this is classification label changes - we need to reindex all the fields of a resource to
# propagate the label changes to the index.
resource.replace_indexer(await resource.generate_index_message(reindex=True))
else:
# TODO - Ideally we should only update the fields that have been changed in the current transaction.
await resource.compute_global_text()
await resource.compute_global_tags(resource.indexer)
await resource.compute_security(resource.indexer)

if resource and resource.modified:
await pgcatalog_update(txn, kbid, resource)
Expand Down
Loading