No "Zap" action available, neither a "iterateDocumentIds()" - how to identify dead documents for removal? #87

discordier · 2024-09-25T10:46:18Z

discordier
Sep 25, 2024

I wanted to implement a "reindex" operation but am unsure how to do this (aside from deleting the data directory and reinstantiating loupe).

IMO we could have two useful methods to implement this:

a new Loupe::zap(): void which will remove all documents from Loupe index and therefore allows to "start over" (essentially the same as removing the data directory but allows to skip the reinstantiating).
a new Loupe::iterateDocumentIds(): iterable<string> - this could be a generator iterating over all document ids which could then be used to check each id if it is a "dead" one.

Background is: we have external changes to the contents of the document directory and files will "disappear" - we now have no way to identify the differences.
Iterating all files will allow to add/update but we can never remove a document from the index at all unless we know that it existed.

Ugly workaround would be to keep track in a separate file but that feels like duplicating the storage.

What do you think, would something like this be useful in Loupe?

Toflar · 2024-09-25T11:18:05Z

Toflar
Sep 25, 2024
Maintainer

Let me try to understand: You have IDs that are removed from an external service and those IDs/documents need to be removed from Loupe now?

As Loupe is just a search engine, maybe you could illustrate what feature you're missing by referencing how other search engines do it? I'd be especially interested in Meilisearch because Loupe is working similarly from an API standpoint :)

0 replies

discordier · 2024-09-25T13:54:08Z

discordier
Sep 25, 2024
Author

For the sake of simplicity, let's assume we have a directory containing text files:

a.txt
b.txt
delete_me.txt
We iterate over all and index them (adding as documents).

Then, some person updates the documents and adds a file add.txt but deletes delete_me.txt via file manager in that folder.

A cronjob now iterates over the directory again and adds/updates all documents (we can probably finetune this by validating Loupe::getDocument() returns non null and timestamp matches or the like.

Problem is, we can not detect that delete_me.txt has been in fact deleted, as we can't retrieve a list of the ids or iterate over all documents in the index (or am I mistaken on that one)?

I'm searching for something like this: https://www.meilisearch.com/docs/reference/api/documents#delete-all-documents (callled zap in DBase, truncate in MySQL) to be able to start clean.

For iterating the documents (or at least their ids), I only found https://www.meilisearch.com/docs/reference/api/documents#get-documents-with-post which might provide the desired result when filter is omitted and fields contains only the id - I lack a Meilisearch instance to validate this assumption though. Maybe this also works in Loupe with an empty search term - yet this feels like an ugly hack.

1 reply

Toflar Sep 25, 2024
Maintainer

I see. I think what you're looking for is the ability to delete the entire index: https://www.meilisearch.com/docs/reference/api/indexes#delete-an-index

Which is probably indeed achieved easiest by just swiping the entire directory and re-creating it. So sure, feel free to contribute such a method! Not sure about zap() because that's not very common in the search engine world but also I have no better idea really :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No "Zap" action available, neither a "iterateDocumentIds()" - how to identify dead documents for removal? #87

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

No "Zap" action available, neither a "iterateDocumentIds()" - how to identify dead documents for removal? #87

discordier Sep 25, 2024

Replies: 2 comments · 1 reply

Toflar Sep 25, 2024 Maintainer

discordier Sep 25, 2024 Author

Toflar Sep 25, 2024 Maintainer

discordier
Sep 25, 2024

Replies: 2 comments 1 reply

Toflar
Sep 25, 2024
Maintainer

discordier
Sep 25, 2024
Author

Toflar Sep 25, 2024
Maintainer