-
Notifications
You must be signed in to change notification settings - Fork 0
Home
The main goal of this project is to speed up the terrible Hunspell analyzer
performance especially in situation when someone wants to index big amount of data. To solve that, the project provides the standard Hunspell analyzer enhanced by cache functionality. (Term which has already been analyzed will not be analyzed again. It will be taken from cache.)
Note: Caches are represented by simple hashmaps and I don't care about the cleanings. We suppose that usage of this library will be only in special situation - i.e. migrating of indexes,etc
Because of usage of internal structures the jar file must be copied directly to <solr_home>/server/solr-webapp/WEB-INF/
. After that please find in your managed_schema
all definitions of hunspell filter:
<filter class="solr.HunspellStemFilterFactory" dictionary="cs_CZ_ascii.dic" affix="cs_CZ_ascii.aff"
ignoreCase="true"/>
and replace it by following snippet of code:
<filter class="org.apache.lucene.analysis.hunspell.HunspellCachedStemFilterFactory" dictionary="cs_CZ_ascii.dic" affix="cs_CZ_ascii.aff"
ignoreCase="true"/>
and then start your indexation process.
Ones the process is finished, change your schema back to previous version.
There are two types of caches. First one is called L1 and it is dedicated for not frequently used terms. Cleaning of this cache is driven by the time interval. Ones time slot run out, the cache is completely deleted. Deleting trigger is the pushing of new item into cache.
If there are items which are used more frequently then threshold value, the item is moved to cache called L2. This cache is dedicated for more frequently terms. It is also driven by time interval but unlike L1 the interval is much bigger.
The intervals can be passed to solr by system property.
- surviveL1 - time slot for cache L1, default value is 1 hour
- surviveL2 - time slot for cache L2, default value is 8 hour
- l1L2 - threshold value for promoting items from L1 to L2, default value is 40
Example of passign arguments:
solr start -f -a "-DsurviveL1=30000 -DsurviveL2=3600000 -Dl1L2=2"