This lab will take you through the process of creating both replicated regions and partitioned regions in GemFire. An XML file will be used for the configuration.
Concepts you will gain experience with:
-
How to create a replicated region across multiple servers
-
Testing the failover and refresh of data on recovered nodes
-
Configuring partitioned regions in GemFire
Tip
|
TODO comments have been embedded within the lab materials as a form of quick instructions. Use the Tasks View in STS to display them (Window → Show view → Tasks) |
Note
|
If servers and locator are still running from the previous lab, be sure to stop them at this point. Remember, in order to shut the services down, you’ll need to re-connect to the locator using the command connect .
|
It is useful to consult the GemFire cache.xml reference when working through the lab instructions below.
In this first section, you will work with the Book region. The data that this region holds is a relatively small amount, is slow changing, and tends to be reference data. This is the type of data that is a good candidate for replication. We will walk through defining a region as replicated, watching shutdown of nodes, and the redundancy.
-
cd
to theserver-regions
project folder on the command line. This will be your starting point for this lab. -
Use your IDE to open the
cluster/cluster.xml
file. Add a region attribute to define theBook
region as a replicated region. -
If the locator is not running, start it using
gfsh
with a command similar to the one used in the prior lab (start locator..). Be sure to start gfsh from the within thecluster/
folder. -
Start
server1
using a command similar to what was used in the prior lab. Do not start the second server yet. -
In your IDE, locate and run the
BookLoader
class to load 3 books into theBook
region. -
Open the test class
ReplicationTest
to understand how it looks for values in theBook
region. -
Run the
ReplicationTest
to verify that the books were found. -
Start the second server using the name
server2
.NoteThe --server-port=0
option is handy for auto-assigning ports for a server. -
Examine the region details by executing the gfsh command
show metrics --region=/Book
. Note the member count and the number of entries for the cluster. -
Stop server1 using the gfsh
stop server
command. -
Re-run
ReplicationTest
and verify that the data can still be found in the remaining server (server 2). -
Stop all the servers in preparation for the next section, but keep the locator running.
In this section, you will use the Customer
region and try out different partitioning scenarios. You will be using three server instances this time so you can see the benefits of partitioning with redundancy.
-
Return to the
cluster/cluster.xml
file and modify theCustomer
region: set the region type to partitioned. -
Start three servers, calling them
server1
,server2
, andserver3
. -
Run the
CustomerLoader
class to load 3 customers into the distributed system. -
You can observe the partitioning by issuing the following command.
gfsh> show metrics --region=/Customer --categories=partition --member=server1
Note the reported values for
bucketCount
,primaryBucketCount
andconfiguredRedundancy
. Try this for all three servers.
For a partitioned region, gfsh has a convenient command, named locate entry
, which provides simple way to find out what specific cache servers host an entry.
-
Use the command
help locate entry
to discover how to properly invoke this command, and what arguments to supply.Tipyou’ll need to specify the key-class
for gfsh to properly lookup the entry -
Use the
locate entry
command to locate one of the Customer records that you just loaded into GemFire.NoteYou should get an error message similar to this:
Message : A ClassNotFoundException was thrown while trying to deserialize cached value. Result : false
Certain gemfire commands require the server to deserialize entries. For now, we’re using basic java serialization, which requires that our domain definitions (Customer, Book, etc..) be on the server process’s classpath.
There are two ways to achieve this:
-
start the server processes with the
start server
command and with a--classpath
argument -
hot deploy the domain jar file using the gemfire
deploy
command.
Let’s use the second option:
-
exit gfsh
-
navigate to the
domain
directory -
Generate the domain jar file:
$ gradle assemble
This will compile the code, construct the jar file, and place it into the
build/libs
folder -
Finally, cd to the
build/libs
folder, launch gfsh, connect to the cluster, and deploy the jar file:$ cd build/libs $ gfsh gfsh> connect gfsh> deploy --jar=domain.jar
We can now get back to locating entries:
-
re-issue the
locate entry
command for each of the three Customer entries, by supplying the region, key-class, and key -
the output should display the server hosting the primary copy of the entry and the server that hosts the redundant copy.
Stop all the servers (but not the locator).
Tip
|
the shutdown command is a simple way to quickly shut down all running servers, but not the locator
|
It depends:
-
Maven usually places
.class
files in thetarget/classes
folder -
Before version 4.0, gradle used to put class file in
build/classes/main
-
As of version 4.0, gradle now uses
build/classes/java/main
, to account for the fact that projects may be written in more than one language -
Eclipse and STS typically auto-compile Java source code to a folder named
bin
In the prior partitioned region configuration, if one of the servers stops for some reason, all the data stored in that partition is lost. In this section, we’ll address that by adding a redundancy factor.
-
Go back to the
cluster.xml
file and modify the Customer region attributes to add a redundancy of 1 (meaning there will be one primary and one redundant copy of every entry).TipYou can do this either by modifying the region shortcut or by inserting a partition-attributes
element and specifying this. However, in a later step, you’ll add a recovery delay value so you may want to take the extra time to type in thepartition-attributes
element now. -
Save the file and re-start the servers. Re-run the
CustomerLoader
class to re-load the customers. -
Repeat the show metrics command to see what has changed with the updated partitioned region configuration.
-
Now, stop
server3
and repeat theshow metrics
command for the remaining two servers. You’ll notice that theprimaryBucketCount
value for one of the surviving servers will have increased from 1 to 2, indicating that one of the redundant copies was promoted. Notice also thatnumBucketsWithoutRedundancy
is not 0. This indicates that when the server was lost, and the redundant bucket was promoted, redundancy was not re-established for this or any redundant buckets that were on that server.
You can obtain even more detail by installing and then calling a GemFire function named PRBFunction
. The code for this function is in the module named functions
.
Let’s build and deploy this function to our cluster:
-
In a terminal, change directories to the
functions
module:$ cd functions
-
Next, build the module:
$ gradle assemble
This should produce a jar file in the
build/libs
subfolder -
Navigate to the
build/libs
subfolder:$ cd build/libs
-
launch gfsh and connect to your cluster:
$ gfsh gfsh> connect
-
Invoke these commands to ensure that you’re connected and to verify that no functions are currently registered with the distributed system members:
gfsh> list members gfsh> list functions
The output should say No Functions Found.
-
Now, deploy the jar file:
gfsh> deploy --jar=functions.jar
-
Finally, invoked
list functions
once more to validate that thePRBFunction
is now installed:gfsh> list functions
We’re now ready to execute this function. Back in the server-regions
module, under the io.pivotal.training.prb
package, you’ll find a class named PRBFunctionExecutor
. This program basically invokes the PRBFunction
we just installed. Run it.
You’ll see that very extensive output is printed that displays every primary bucket and every redundant bucket for each server. Look for buckets with a size > 0 to identify which contain entries. You should see output similar to the following for every server.
Member: HostMachine(server2:77234)<v2>:58224 Primary buckets: Row=1, BucketId=2, Bytes=0, Size=0 Row=2, BucketId=4, Bytes=0, Size=0 Row=3, BucketId=9, Bytes=0, Size=0 Row=4, BucketId=12, Bytes=0, Size=0 Row=5, BucketId=13, Bytes=0, Size=0 .... Row=20, BucketId=60, Bytes=0, Size=0 Row=21, BucketId=61, Bytes=676, Size=1
Stop the servers once more.
This time, you will add a recovery delay so that after a period of time, redundancy will be re-established. This will address the issue identified in the prior section.
-
Go back to the
cluster/cluster.xml
file and modify the partition-attributes element to define a recovery delay of 5 seconds.TipIf you used a region shortcut in the prior section, you’ll need to add a partition-attributes element inside the region-attributes element for the Customer
region. Consult this reference if necessary. -
Save the file and re-start all the servers. Re-run the
CustomerLoader
class to re-load the customers. -
Now, stop
server3
and repeat theshow metrics
command for the remaining two servers. If you run this command within 5 seconds of stoppingserver3
, you’ll likely see thenumBucketsWithoutRedundancy
is still not 0. Wait a few more seconds and repeat the command. You should see that this value will return to 0. This indicates that redundancy has been re-established within the remaining servers. -
Alternatively, you can re-run the
PRBFunctionExecutor
to print out more detailed bucket listing as outlined in the prior section (you’ll have to redeploy the jar file). -
Stop the servers for the final time. Also stop the locator.
Congratulations!! You have completed this lab.