-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
search stemming is overzealous #1563
Comments
Some initial responses:
|
For reference in this conversation, Blacklight core ships with a single boosted unstemmed field in the default search. Details below. I think the inclusion of this in Blacklight core makes a strong case that it would not be heavy handed to also include it in Arclight by default. In the solrconfig.xml: <str name="pf">
all_text_timv^10
</str> In the schema
<field name="all_text_timv" type="text" stored="false" indexed="true" multiValued="true" termVectors="true" termPositions="true" termOffsets="true"/>
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory"/>
<filter class="solr.ICUFoldingFilterFactory"/> <!-- NFKC, case folding, diacritics removed -->
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>
<copyField source="*_tsim" dest="all_text_timv" maxChars="3000"/>
<copyField source="*_tesim" dest="all_text_timv" maxChars="3000"/>
<copyField source="*_ssim" dest="all_text_timv" maxChars="3000"/>
<copyField source="*_si" dest="all_text_timv" maxChars="3000"/> |
Following up from the November 4 community call, there was discussion around search term stemming in Solr, and how it's currently too aggressive. Users have found that the following terms are getting buried due to stemming:
eugenics
matcheseugene
organs
matchesorganization
There's no way for the user to search exact terms without stemming, since quotation marks only group phrases and won't bypass stemming.
Duke's approach to this was to include an unstemmed index field (https://gitlab.oit.duke.edu/dul-its/dul-arclight/-/blob/main/solr/arclight/conf/solrconfig.xml#L133), which is weighted above the stemmed ones.
Questions to discuss:
The text was updated successfully, but these errors were encountered: