Skip to content
This repository has been archived by the owner on Jun 16, 2023. It is now read-only.

Basic conception

heipacker edited this page Feb 12, 2014 · 4 revisions

Stream

JStorm contain the abstraction of stream, which is an ongoing continuous unbounded tuple sequence. Note that the abstract event in stream is tuple when modeling event stream in JStrom, followed when we explain JStorm we will explain how to use it. STREAM

Spout/Bolt

JStorm consider that every stream has a source, so sources are abstracted as spout. it may be a source which connect to messaging middleware(MetaQ、Kafka、TBNotify Etc.), and continue to send out a message, it may be continuously read from a queue, then emitted out the tuple which assemblied by the element of the queue.

With spout there is a stream, then how to handle tuple stream within it ? The same ideas, in the middle of the process, tuple abstracted as a bolt, bolt can consume any number of input streams, as long as the direction of the stream be guided to this bolt, but it also can send a new stream to other bolt for use, As a result, simply open a specific spout (nozzle), then spout out of the tuple -oriented in a particular bolt, and bolt handle and oriented it to other bolt.

We can consider spout as a faucet, and each faucet outflow water is different, which kind of water we want is just unscrew that faucet, Then use the pipeline direct water of faucet to a water processor(bolt), Water processor processed and then use pipeline directed water to another processor or stored in containers. spoutbolt

Topology

topology

According to the introduction all above, we can easily understand this picture, This is a DAG, in JStorm this picture abstracted as a topology(indeed topological structure is a directed acyclic), topology is the highest abstraction in JStorm, it can be submit to JStorm cluster and execute, a topology is a data stream conversion chart, each node in above graph represent a spout or bolt, edges in the graph represents one bolt subscribe to the stream. When spout or bolt send tuple to a stream, then it will sends tuple to each bolt subscribed from this stream.(which means we do not need to manually pull the pipeline, as long as pre-subscriptions, spout stream will be sent to the appropriate bolt). Here insert position to talk about the achieve of topology in JStorm, in order to make real-time calculation, we need to design a topology diagram and implement the handle detail of bolt, topology is just some defined thrift structure, so that we can use other languages to create or submit topology.

Tuple

The data in stream abstracted as tuple in JStorm, a tuple is a list of values, each value in the list has a name, and the value can be a basic type, character type, byte array, of course, also be other serializable type. Each node in the topology of the field in which it must explain emitted tuple name, other nodes only need to subscribe to the name.

Worker/Task

Worker and Task is the execution units in JStorm task, a worker represents a process, a task represents a thread, and worker can run more than one task.

Resource Slot

In JStorm, the resource types are divided into four dimensions, CPU, Memory, Disk, and Port, no longer confined to port like Storm. That is, how many CPU Slot, how many Memory Slot, the number of Disk Slot, how many Port Slot a supervisor can provide.

  • A worker consumes a Port Slot, by default a task consumes a CPU Slot and a Memory Slot.
  • When the task is a computational task, you can apply more CPU Slot.
  • When the task needs more memory, you can apply more Memory Slot.
  • When the task needs more disk access, you can apply for the Disk Slot, and then the Disk Slot is exclusively to the task.
Clone this wiki locally