You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 22, 2020. It is now read-only.
There are some cases where the question of what to consider as a "document" -- i.e. the fundamental unit of search indexing in ElasticSearch -- is questionable.
Two prototypical cases:
Really long documents, like hundred-some page reports. These are hard because they often cover multiple topics and it's hard to get ElasticSearch to tell us where in that sort of document a hit occurs. The temptation is to split them into chapters or individual pages for indexing. But then you may want to continue reading the whole document
Smushed together documents. Sometimes FOIAs show up as one (or a few) PDFs with multiple responsive documents all squished together in one PDF. These documents are sometimes multiple pages long. Indexing, say, 5 very long documents is not a good idea, since the documents don't have anything in common. But splitting on pages, again, separates the pages of multi-page documents.
Possible solutions:
add an additional field in ElasticSearch and a button in the interface to go the next/prev page in a multi-page document (regardless of type).
continue to tweak elasticsearch to store locations so we can scroll you to the location of your hits in the detail view.
other ideas???
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
There are some cases where the question of what to consider as a "document" -- i.e. the fundamental unit of search indexing in ElasticSearch -- is questionable.
Two prototypical cases:
Possible solutions:
The text was updated successfully, but these errors were encountered: