RedStorm provides the JRuby integration for the Storm distributed realtime computation system.
This has been tested on OSX 10.6.8 and Linux 10.04 using Storm 0.6.2 and JRuby 1.6.6
$ gem install redstorm
- create a new empty project directory.
- install the RedStorm gem.
- create a subdirectory which will contain your sources.
- perform the initial setup as described below to install the dependencies in the
target/
subdir of your project directory. - run your topology in local mode and/or on a production cluster as described below.
- install RedStom dependencies; from your project root directory execute:
$ redstorm install
The install
command will install all Java jars dependencies using ruby-maven in target/dependency
and generate & compile the Java bindings in target/classes
DON'T PANIC it's Maven. The first time you run $ redstorm install
Maven will take a few minutes resolving dependencies and in the end will download and install the dependency jar files.
- create a topology class. The underscore topology_class_file_name.rb MUST correspond to its CamelCase class name.
Until this is better integrated, you can use gems in local mode and on a production cluster:
-
local mode: simply install your gems the usual way, they will be picked up when run in local mode.
-
production cluster: install your gem in the
target/gems
folder using:
gem install <the gem> --install-dir target/gems/ --no-ri --no-rdoc
$ redstorm local <path/to/topology_class_file_name.rb>
See examples below to run examples in local mode or on a production cluster.
- generate
target/cluster-topology.jar
. This jar file will include your sources directory plus the required dependencies from thetarget/
directory:
$ redstorm jar <sources_directory>
- submit the cluster topology jar file to the cluster. Assuming you have the Storm distribution installed and the Storm
bin/
directory in your path:
storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher cluster <path/to/topology_class_file_name.rb>
Basically you must follow the Storm instructions to setup a production cluster and submit your topology to the cluster.
Install the example files in your project. The examples/
dir will be created in your project root dir.
$ redstorm examples
All examples using the simple DSL are located in examples/simple
. Examples using the standard Java interface are in examples/native
.
$ redstorm local examples/simple/exclamation_topology.rb
$ redstorm local examples/simple/exclamation_topology2.rb
$ redstorm local examples/simple/word_count_topology.rb
This next example requires the use of the Redis Gem and a Redis server running on localhost:6379
$ redstorm local examples/simple/redis_word_count_topology.rb
Using redis-cli
, push words into the test
list and watch Storm pick them up
All examples using the simple DSL can also run on a productions cluster. The only native example compatible with a production cluster is the ClusterWordCountTopology
- genererate the
target/cluster-topology.jar
and include theexamples/
directory.
$ redstorm jar examples
- submit the cluster topology jar file to the cluster, assuming you have the Storm distribution installed and the Storm
bin/
directory in your path:
$ storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher cluster examples/simple/word_count_topology.rb
- to run
examples/simple/redis_word_count_topology.rb
you need a Redis server running onlocalhost:6379
and the Redis gem intarget/gems
using:
gem install redis --install-dir target/gems/ --no-ri --no-rdoc
- generate jar and submit:
$ redstorm jar examples
$ storm jar ./target/cluster-topology.jar redstorm.TopologyLauncher cluster examples/simple/redis_word_count_topology.rb
- using
redis-cli
, push words into thetest
list and watch Storm pick them up
Basically you must follow the Storm instructions to setup a production cluster and submit your topology to the cluster.
Your project can be created in a single file containing all spouts, bolts and topology classes or each classes can be in its own file, your choice. There are many examples for the simple DSL.
The DSL uses a callback metaphor to attach code to the topology/spout/bolt execution contexts using on_*
DSL constructs (ex.: on_submit, on_send, ...). When using on_*
you can attach you code in 3 different ways:
- using a code block
on_receive (:ack => true, :anchor => true) {|tuple| do_something_with(tuple)}
on_receive :ack => true, :anchor => true do |tuple|
do_something_with(tuple)
end
- defining the corresponding method
on_receive :ack => true, :anchor => true
def on_receive(tuple)
do_something_with(tuple)
end
- defining an arbitrary method
on_receive :my_method, :ack => true, :anchor => true
def my_method(tuple)
do_something_with(tuple)
end
The example SplitSentenceBolt shows the 3 different coding style.
Normally Storm topology components are assigned and referenced using numeric ids. In the SimpleTopology DSL ids are optional. By default the DSL will use the component class name as an implicit symbolic id and bolt source ids can use these implicit ids. The DSL will automatically resolve and assign numeric ids upon topology submission. If two components are of the same class, creating a conflict, then the id can be explicitly defined using either a numeric value, a symbol or a string. Numeric values will be used as-is at topology submission while symbols and strings will be resolved and assigned a numeric id.
require 'red_storm'
class MyTopology < RedStorm::SimpleTopology
spout spout_class, options
bolt bolt_class, options do
source source_id, grouping
...
end
configure topology_name do |env|
config_attribute value
...
end
on_submit do |env|
...
end
end
spout spout_class, options
spout_class
— spout Ruby classoptions
:id
— spout explicit id (default is spout class name):parallelism
— spout parallelism (default is 1)
bolt bolt_class, options do
source source_id, grouping
...
end
bolt_class
— bolt Ruby classoptions
:id
— bolt explicit id (default is bolt class name):parallelism
— bolt parallelism (default is 1)
source_id
— source id reference. can be the source class name if unique or the explicit id if definedgrouping
:fields => ["field", ...]
— fieldsGrouping using fields on the source_id:shuffle
— shuffleGrouping on the source_id:global
— globalGrouping on the source_id:none
— noneGrouping on the source_id:all
— allGrouping on the source_id:direct
— directGrouping on the source_id
configure topology_name do |env|
configuration_field value
...
end
The configure
statement is required.
topology_name
— alternate topology name (default is topology class name)env
— is set to:local
or:cluster
for you to set enviroment specific configurationsconfig_attribute
— the Storm Config attribute name. See Storm for complete list. The attribute name correspond to the Java setter method, without the "set" prefix and the suffix converted from CamelCase to underscore. Ex.:setMaxTaskParallelism
is:max_task_parallelism
.:debug
:max_task_parallelism
:num_workers
:max_spout_pending
- ...
on_submit do |env|
...
end
The on_submit
statement is optional. Use it to execute code after the topology submission.
env
— is set to:local
or:cluster
For example, you can use on_submit
to shutdown the LocalCluster after some time. The LocalCluster instance is available usign the cluster
method.
on_submit do |env|
if env == :local
sleep(5)
cluster.shutdown
end
end
require 'red_storm'
class MySpout < RedStorm::SimpleSpout
set spout_attribute => value
...
output_fields :field, ...
on_send options do
...
end
on_init do
...
end
on_close do
...
end
on_ack do |msg_id|
...
end
on_fail do |msg_id|
...
end
end
set spout_attribute => value
The set
statement is optional. Use it to set spout specific attributes.
spout_attributes
:is_distributed
— set totrue
for a distributed spout (default isfalse
)
output_fields :field, ...
Define the output fields for this spout.
:field
— the field name, can be symbol or string.
on_send options do
...
end
on_send
relates to the Java spout nextTuple
method and is called periodically by storm to allow the spout to output a tuple. When using auto-emit (default), the block return value will be auto emited. A single value return will be emited as a single-field tuple. An array of values [a, b]
will be emited as a multiple-fields tuple. Normally a spout should only output a single tuple per on_send invocation.
:options
:emit
— set tofalse
to disable auto-emit (default istrue
)
on_init do
...
end
on_init
relates to the Java spout open
method. When on_init
is called, the config
, context
and collector
are set to return the Java spout config Map
, TopologyContext
and SpoutOutputCollector
.
on_close do
...
end
on_close
relates to the Java spout close
method.
on_ack do |msg_id|
...
end
on_ack
relates to the Java spout ack
method.
on_fail do |msg_id|
...
end
on_fail
relates to the Java spout fail
method.
require 'red_storm'
class MyBolt < RedStorm::SimpleBolt
output_fields :field, ...
on_receive options do
...
end
on_init do
...
end
on_close do
...
end
end
on_receive options do
...
end
on_receive
relates to the Java bolt execute
method and is called upon tuple reception by Storm. When using auto-emit, the block return value will be auto emited. A single value return will be emited as a single-field tuple. An array of values [a, b]
will be emited as a multiple-fields tuple. An array of arrays [[a, b], [c, d]]
will be emited as multiple-fields multiple tuples. When not using auto-emit, the unanchored_emit(value, ...)
and anchored_emit(tuple, value, ...)
method can be used to emit a single tuple. When using auto-anchor (disabled by default) the sent tuples will be anchored to the received tuple. When using auto-ack (disabled by default) the received tuple will be ack'ed after emitting the return value. When not using auto-ack, the ack(tuple)
method can be used to ack the tuple.
Note that setting auto-ack and auto-anchor is possible only when auto-emit is enabled.
:options
:emit
— set tofalse
to disable auto-emit (default istrue
):ack
— set totrue
to enable auto-ack (default isfalse
):anchor
— set totrue
to enable auto-anchor (default isfalse
)
on_init do
...
end
on_init
relates to the Java bolt prepare
method. When on_init
is called, the config
, context
and collector
are set to return the Java spout config Map
, TopologyContext
and SpoutOutputCollector
.
on_close do
...
end
on_close
relates to the Java bolt cleanup
method.
- JRuby 1.6.6
- rake gem ~> 0.9.2.2
- ruby-maven gem ~> 3.0.3.0.28.5
- rspec gem ~> 2.8.0
Fork the project, create a branch and submit a pull request.
Some ways you can contribute:
- by reporting bugs using the issue tracker
- by suggesting new features using the issue tracker
- by writing or editing documentation
- by writing specs
- by writing code
- by refactoring code
- ...
- fork project
- create branch
- install dependencies in
target/dependencies
$ rake deps
- generate and build Java source into
target/classes
$ rake build
- run topology in local dev cluster
$ bin/redstorm local path/to/topology_class.rb
- generate remote cluster topology jar into
target/cluster-topology.jar
, including theexamples/
directory.
$ rake jar['examples']
Colin Surprenant, @colinsurprenant, [email protected], [email protected], http://github.com/colinsurprenant
Apache License, Version 2.0. See the LICENSE.md file.