Question on alternative alphabet searches in Lunr.js #58

geniza · 2022-09-13T19:33:34Z

geniza
Sep 13, 2022

Dear CollectionBuilder Team Members, I hope you all had a great summer and that the semester is off to a good start! I spent a lot of the summer polishing/expanding the metadata for the site we had corresponded about previously in the Q&A discussion board in June. The late stages of the development process are going well but I recently noticed an issue where there aren't any search results appearing for alternative alphabet queries. I'm finding that the Fuzzy Search via Lunr.js currently only supports Latin alphabet searches, however, my metadata has entries that contain words in Hebrew, Arabic, and Greek. I found the Lunr page on "Language support" mods but of those three, only Arabic script is supported. How would you recommend I go about altering the Lunr code to support all three alternative alphabets? [https://lunr.readthedocs.io/en/v0.5.0/languages/]

Answered by evanwill

Sep 13, 2022

Good question @geniza !
I want to clarify that the link you have above is to a python version of Lunr that can do some more complicated things--CollectionBuilder is using Lunr.js so be sure to look for docs specifically related to it.
There is a couple of things you can do.

First, quick fix to get it working (based on a solution in an Issue in Lunr.js):

In your project, look for the file "_includes/js/lunr-js.html".
at line 14 you will see a commented out option that looks like //this.pipeline.remove(lunr.trimmer), on a new line below that, add the code this.pipeline.reset(). Save the file. This will just remove the english specific processing, so that all the characters will be indexed …

View full answer

evanwill · 2022-09-13T20:34:25Z

evanwill
Sep 13, 2022
Maintainer

Good question @geniza !
I want to clarify that the link you have above is to a python version of Lunr that can do some more complicated things--CollectionBuilder is using Lunr.js so be sure to look for docs specifically related to it.
There is a couple of things you can do.

First, quick fix to get it working (based on a solution in an Issue in Lunr.js):

In your project, look for the file "_includes/js/lunr-js.html".
at line 14 you will see a commented out option that looks like //this.pipeline.remove(lunr.trimmer), on a new line below that, add the code this.pipeline.reset(). Save the file. This will just remove the english specific processing, so that all the characters will be indexed correctly. It will slightly alter results (basically making it less fuzzy), but seems to work okay.
Be sure to check your "_data/config-search.csv" to ensure the fields you want are being indexed (for example in your project, you probably want to add the transliteration field)

Second, if you want to go further and add real language support to the processing pipeline, you can use the "lunr-languages" plugin. The Lunr docs only cover doing it if you are using npm to set up your javascript, and use a different style of writing js functions--so there is some additional steps to adapt it to CollectionBuilder's set up. I will give it a try myself and post an example soon!

2 replies

evanwill Sep 13, 2022
Maintainer

forgot to say, as you mentioned the disadvantage of using the lunr-languages plugin approach is that there is only about 19 additional languages with pipelines created so far, see https://github.com/MihaiValentin/lunr-languages
So for your case, unless you can write your own stemmer plugin following their example, I guess just turning off the pipelines seems best?
It might be possible to use the multiLanguage support to set up a pipeline that uses the english stemmer, but then for non-english just turns it off--not sure...

geniza Sep 15, 2022
Author

Thanks so much! That resolved the issue and now I can trace everything regardless of script or special characters :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CollectionBuilder

Question on alternative alphabet searches in Lunr.js #58

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

CollectionBuilder

Question on alternative alphabet searches in Lunr.js #58

geniza Sep 13, 2022

Replies: 1 comment · 2 replies

evanwill Sep 13, 2022 Maintainer

evanwill Sep 13, 2022 Maintainer

geniza Sep 15, 2022 Author

geniza
Sep 13, 2022

Replies: 1 comment 2 replies

evanwill
Sep 13, 2022
Maintainer

evanwill Sep 13, 2022
Maintainer

geniza Sep 15, 2022
Author