Skip to content

Latest commit

 

History

History
36 lines (30 loc) · 1.05 KB

File metadata and controls

36 lines (30 loc) · 1.05 KB

Eliminate duplicate words components for Apache Lucene/Solr

Build Status

Please use the following field type definitions.

Remove duplicate words

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="org.apache.lucene.EliminateDuplicateFilterFactory" />
  </analyzer>
</fieldType>

Result

Input Output
text word word text word word text word

Custom PositionFilterFactory

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
	  <filter class="org.apache.lucene.PositionFilterFactory" />
	  <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

Result

Input Output
text word word text word word text word