Skip to content

Latest commit

 

History

History
87 lines (43 loc) · 5.12 KB

27. Event Streaming Platform.md

File metadata and controls

87 lines (43 loc) · 5.12 KB

Event Streaming Platform

Today, data is in motion, and engineering teams need to model applications to process business requirements as streams of events, not as data at rest, sitting idly in a traditional data store.

Apache Kafka

Apache Kafka is an open-source distributed event streaming platform.

IMG_9151

We can regard Producer and Consumer as clients, Brokers as servers.

A Kafka cluster consists of multiple Brokers.

Topic is partitioned; Broker hosts partitions.

image

Partition is replicated; One replica is leader.

image

There is an offset associated with each message.

image

A request from Producer looks like "Write data to topic A partition 2".

image

Kafka consumers are typically part of a consumer group . When multiple consumers are subscribed to a topic and belong to the same consumer group, each consumer in the group will receive messages from a different subset of the partitions in the topic.

There is a consumer coordinator to manage the assignment

Common Architecture

An example model looks as below

image

We can design business processes and applications around Event Streams. Everything, from sales, orders, trades, and customer experiences to sensor readings and database updates, is modeled as an Event. Events are written to the Event Streaming Platform once, allowing distributed functions within the business to react in real time. Systems external to the Event Streaming Platform are integrated using Event Sources and Event Sinks. Business logic is built within Event Processing Applications, which are composed of Event Processors that read events from and write events to Event Streams.

Table

Projection Table

How can a stream of change events be efficiently summarized to give the current state of the world?

image

We can maintain a projection table that behaves just like a materialized view in a traditional database.

As new events come in, the table is automatically updated, constantly giving us a live picture of the system.

Events with the same key are considered related; newer events are interpreted, depending on their contents, as updates to or deletions of older events.

As with a materialized view, projection tables are read-only. To change a projection table, we change the underlying data by recording new events to the table's underlying stream.

ksqlDB supports easy creation of summary tables and materialized views. We declare them once, and the server will maintain their data as new events stream in.

State Table

How can an Event Processor manage mutable state, similar to how a table does in a relational database?

image

Event Processors often need to perform stateful operations, such as an aggregation (for example, counting the number of events). The state is similar to a table in a relational database, and is mutable: it allows for read and write operations.

It is essential that the event processor has an efficient and fault-tolerant mechanism for state management--for recording and updating the state while processing input events--to ensure correctness of the computations and to prevent data loss and data duplication.

We need to implement a mutable state table that allows the Event Processor to record and update state. For example, to count the number of payments per customer, a state table provides a mapping between the customer (for example, a customer ID) and the current count of payments.

The state's storage backend can vary by implementation: options include local state stores (such as RocksDB), remote state stores (such as Amazon DynamoDB or a NoSQL database), and in-memory caches. Local state stores are usually recommended, as they do not incur additional latency for network round trips, and this improves the end-to-end performance of the Event Processor.

The streaming database ksqlDB provides state tables out of the box with its TABLE data collection. The implementation uses local, fault-tolerant state stores that are continuously backed up into ksqlDB's distributed storage layer -- Kafka -- so that the data is durable.

*References

https://developer.confluent.io/patterns/event-stream/event-streaming-platform/

https://www.instaclustr.com/blog/apache-kafka-architecture/

https://ksqldb.io/

https://kafka.apache.org/intro