-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Basic conception
JStorm contain the abstraction of stream, which is an ongoing continuous unbounded tuple sequence. Note that the abstract event in stream is tuple when modeling event stream in JStrom, followed when we explain JStorm we will explain how to use it.
According to the introduction all above, we can easily understand this picture, This is a DAG, in JStorm this picture abstracted as a topology(indeed topological structure is a directed acyclic), topology is the highest abstraction in JStorm, it can be submit to JStorm cluster and execute, a topology is a data stream conversion chart, each node in above graph represent a spout or bolt, edges in the graph represents one bolt subscribe to the stream. When spout or bolt send tuple to a stream, then it will sends tuple to each bolt subscribed from this stream.(which means we do not need to manually pull the pipeline, as long as pre-subscriptions, spout stream will be sent to the appropriate bolt). Here insert position to talk about the achieve of topology in JStorm, in order to make real-time calculation, we need to design a topology diagram and implement the handle detail of bolt, topology is just some defined thrift structure, so that we can use other languages to create or submit topology.
The basic primitives Storm provides for doing stream transformations are “spouts” and “bolts”. Spouts and bolts have interfaces that you implement to run your application-specific logic.
JStorm consider that every stream has a source, so sources are abstracted as spout. it may be a source which connect to Messaging Middleware Component(MetaQ, Kafka, ActiveMq, TBNotify etc.); and continuously send out messages, it may be continuously read from a queue, then emitted out the tuple which assemblies by the element of the queue.
A bolt consumes any number of input streams, does some processing, and possibly emits new streams. Complex stream transformations, like computing a stream of trending topics from a stream of tweets, require multiple steps and thus multiple bolts. Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases, and more.
Networks of spouts and bolts are packaged into a “topology” which is the top-level abstraction that you submit to Storm clusters for execution. A topology is a graph of stream transformations where each node is a spout or bolt. Edges in the graph indicate which bolts are subscribing to which streams. When a spout or bolt emits a tuple to a stream, it sends the tuple to every bolt that subscribed to that stream.
The data in stream abstracted as tuple in JStorm, a tuple is a list of values, each value in the list has a name, and the value can be a basic type, character type, byte array, of course, also be other serializable type. Each node in the topology of the field in which it must explain emitted tuple name, other nodes only need to subscribe to the name.
Worker and Task is the execution units in JStorm, a worker represents a process, a task represents a thread, and tasks run in worker, one worker can run more than one task.
In JStorm, the resource types are divided into three dimensions, CPU, Memory, and Port, no longer confined to port like Storm. That is, how many CPU Slot, how many Memory Slot, how many Port Slot a supervisor can provide, please refer to User-Define-Scheduler for details.
- A worker consumes a Port Slot.
- Topology can set how many CPU slot will one worker cost, if there are some tasks cost cpu much, please set this.
- Topology can set how many Memory will one worker task, default size is 2GB, if it isn't enough, please enlarge it.