fix: add tokenizer for the query string #3

BlessSan · 2024-11-16T18:53:19Z

Problem: lunr query search term depends on spaces for each token. However, lunr.ja plugin tokenize the fields (documents) using japanese segmenter.

So, for example the resulting token created for the field 標準製品仕様は will be 標準, 製品, 仕様, and は (as demonstraded in http://chasen.org/~taku/software/TinySegmenter/)

Solution: To search for 標準製品 (which is two words), tokenize the input query using the same tokenizer. Then convert it back into lunr query search term.

Limitation: since this method essentially add spaces to Japanese words (as dictated by the segmenter used by lunr.ja). Using wildcards, fuzzy, boosts (https://lunrjs.com/guides/searching.html) in search method would produce false result.

blesssan added 2 commits November 17, 2024 02:38

fix: add tokenizer for the query string

41cda8b

refactor: moved logic inside existing memo

83c7253

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add tokenizer for the query string #3

fix: add tokenizer for the query string #3

BlessSan commented Nov 16, 2024 •

edited

Loading

fix: add tokenizer for the query string #3

Are you sure you want to change the base?

fix: add tokenizer for the query string #3

Conversation

BlessSan commented Nov 16, 2024 • edited Loading

BlessSan commented Nov 16, 2024 •

edited

Loading