Skip to content
bnewport edited this page Sep 13, 2010 · 28 revisions

Client side jar list (lucene side list)

Lucene and all its dependencies

  • wxslucene.jar
  • wxsutils.jar
  • ogclient.jar or objectgrid.jar

Grid side jar list

  • wxsutils.jar
  • wxslucene.jar
  • objectgrid.jar and all its dependencies

WXS configuration xml files

The objectgrid.xml and deployment.xml files in the test/resources should be used with appropriate modifications for partition counts and replication strategy. The only file requiring modification is the deployment.xml. The only things here to modify are the partition count as well as replication strategy.

Each directory and its file contents are stored in a separate map with a prefix of ChunkMap.

Configuration through property files

The Lucene plugin is configured using two property files that should be on the classpath. The wxsutils.properties file is used to connect to a grid and the wxslucene.properties file configures the directory plugin.

wxsutils.properties

See here

Here is an example wxsutils.properties

wxslucene.properties

The wxslucene.properties file must be in a folder on the classpath. All directories within the client share the same configuration.

Here is an example wxslucene.properties

async_put property (boolean)

This batches writes to a file together so that they are essentially bulk copied to the grid. This improves performance significantly in that multiple blocks are written to each JVM together using a single RPC versus an RPC per block if this property is false.

compression property (boolean)

The wxslucene.properties file in test/resources allows the directory to be customized. The main things to modify here if required are the block size and the compression. We recommend async_put to always be on, it’s MUCH faster when copying disk based directories in to a grid based directory, possible 6-10x faster. The compression can significantly reduce the memory required for the index and reduces the network bandwidth required between the lucene JVMs and the grid. There is a CPU cost and this cost needs to be weighed against the advantages.

If a local block cache is enabled then the cost of compression is significantly reduced as the block cache holds uncompressed blocks. Hence, enabling compression with the block cache is a recommended configuration as its a very good compromise on performance and memory usage.

block_size property (int)

Index files are broken in to chunks or blocks of a fixed size when stored in the grid. The blockSize parameter specifies how big those blocks are. 4k seems a common value for these kinds of system.

block_cache_size property (int)

This is the maximum number of blocks to cache in a LRU cache for each directory. The blocks are cached in uncompressed form. Thus means the maximum memory consumption will be block_cache_size * block_size * #directories in use in a JVM. The block cache is only enabled if this property is specified. It should only be used with read only indexes. This cache is not synchronized across the cluster and so is only usable in situations where the index is read only.

The block cache typically makes an enormous performance difference when enabled. It should be enabled if at all possible. It can also mask compression cost as every cache hit returns an uncompressed block.

How big this value should be depends on the indexes but a good starting value is 1024

Recommended JVM settings and tuning

Generational garbage collection is recommended for read only indexes. We recommend larger 64 bit JVMs typically if required. For example, -Xmx8G. You should specify “-server” on client and container JVMs. When sizing a grid, use a worst case high heap threshold of maybe 80% for example. So, a JVM can be expected to hold 80% of 8G or 6.4Gb of blocks.

Typically with a 30Gb index, thats 60Gb with replication. This means a minimum configuration of 60/6.4 JVMs or ~9.4 or rounded up, 10 × 8GB heap JVMs. A 64GB box should be able to host 7 × 8GB JVMs. So we’d need 2 boxes at a minimum to hold this grid. We’d add one more box with an additional 7 × 8GB JVMs to provide redundancy in the case of a box failing. Remember, you need enough capacity to handle a single box failure as a minimum. If we’re running heaps at 80% and just using 2 boxes then losing one box would likely result in out of memory exceptions on the survivors as the grid tries to recover. So, I’d run 3 boxes so that even if we lose a box, we would be left with the system running at healthy heap level.

WXS is cheaper to license in these scenarios on boxes with more memory as it’s the total number of CPU cores that determines price not memory so buy boxes with more memory.

The partition count is usually 5x the number of JVMs used in a ‘normal’ system after rounding up to the next highest prime number.

Command line to start WXS grid container

Example command line to start a WXS grid container would be:

startOgServer.sh server01 -objectgridFile xml/objectgrid.xml -deploymentPolicyFile xml/deployment.xml -jvmArgs -d64 -Xmx9G -Dcom.sun.management.jmxremote -XX:+AggressiveOpts -verbose:gc -Xloggc:logs/server01.gc.log -XX:-PrintGCDetails -XX:-PrintGCTimeStamps -cp jars/wxsutils-1.3-SNAPSHOT.jar:jars/wxslucene-1.0-SNAPSHOT.jar

Command line for the client JVMs

The wxslucene plugin uses JMX to expose statistics metrics. This means it expects an MBeanServer to be started. The command line argument “-Dcom.sun.management.jmxremote” should start a JMX MBeanServer in a JVM. It must be specified on any JVMs using the wxslucene plugin.

You can attach to a client JVM using jconsole to easily see the MBeans for metrics.

Clone this wiki locally