-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full recrawl and reindex in Solr #18
Comments
To perform a "clean" crawl (without sending only modifications, deletions, etc.) you simply have to delete your "workdir". More precisely, the crawlstore. To not commit until you are done, you can set the "solrCommitDisabled" option to "true" in your Solr committer section. This means the committer will never send a Solr "commit" request, thus relying on your Solr configuration to decide when to commit, or your manual commit. |
Thank you! |
How to deal with deleted docs? A manual delete all? Or maybe parameter send with Solr committer? |
A simple approach would be to add two fields in your collection. One that identifies the source crawler, the second that identifies the crawl date. In your config, you can use the With this, you can use the "delete by query" approach on Solr. You would issue a query that deletes anything older than the date of your full recrawl, for the given crawler. |
How would one go about doing a full recrawl of content (filesystemcrawler) and then only do a (hard) commit after all content has been indexed.
So basically do a fresh update on a live system.
The text was updated successfully, but these errors were encountered: