Skip to content

Releases: quixio/quix-streams

v2.7.0

04 Jul 16:58
892535d
Compare
Choose a tag to compare

Release 2.7.0

What's changed

[New] Support for Exactly-once processing guarantees using Kafka transactions.

With exactly-once processing guarantees enabled, each Kafka message is processed only one time and without duplicated outputs.

It is especially helpful when consistency of data in the output topics is crucial, and the downstream consumers don't handle duplicated data gracefully.

To learn more about the exactly-once processing and configuration, see the "Processing Guarantees" section here.

Other Improvements

  • Removed column_name parameter from the Deserializer class by @tim-quix in #392
  • Update quickstart code and copy to match by @stereosky in #389

v2.6.0

19 Jun 11:43
7327cb1
Compare
Choose a tag to compare

Release 2.6.0

In this release we introduce new features as well as several breaking changes.
Please read the notes carefully before upgrading to 2.6.0.


What's changed

[BREAKING] The original Timestamps and Headers are now passed to the output when using StreamingDataFrame.to_topic()

Previously, StreamingDataFrame.to_topic() used the current epoch when producing output messages were used, and headers were omitted.

Since version 2.6.0, Quix Streams passes the original timestamps and headers of the messages to the output topics for more consistent data processing.

This change affects the data in the output topics, therefore it is marked a breaking one.

If you want to keep the previous behavior, you may set the timestamp to the current epoch and drop message headers before producing the output message:

import time

output_topic = app.topic(...)

sdf = app.dataframe(...)
# Do some processing here ...

# Set the timestamp to the current epoch
sdf = sdf.set_timestamp(lambda value, key, timestamp, headers: int(time.time() * 1000))

# Set empty message headers
sdf = sdf.set_headers(lambda value, key, timestamp, headers: [])

# Producing message to the output topic
sdf = sdf.to_topic(output_topic)

[BREAKING] Window results timestamps are set to the window start by default

Since 2.6.0, the results of the windowed aggregations use the window start timestamp as a message timestamp.

You may adjust the timestamps using the new StreamingDataFrame.set_timestamp() API.

[BREAKING] Removed key and timestamp attributes from the MessageContext class

To access the message keys and timestamps, please use the new API described below

[BREAKING] final() and current() methods of Windows don't have the expand parameter anymore

[NEW] New APIs to access and update message metadata in StreamingDataFrame

Accessing message metadata during processing

Docs:

Previously, the Kafka message metadata resided in a separate MessageContext instance.
For example, to access a message key or a timestamp, users needed to import a quixstreams.message_context function, which is not straightforward:

from quixstreams import message_context

sdf = app.dataframe(...)
# Previous way of getting a message key in versions < 2.6.0
sdf['message_key'] = sdf.apply(lambda value: message_context().key)

Now, the .apply(), .filter(), and .update() methods of StreamingDataFrame accept a new parameter - metadata=True.
Passing metadata=True to any of the functions above will inform StreamingDataFrame to provide additional positional arguments with the message metadata to the callback.

Example:

from quixstreams import Application

sdf = app.dataframe(...)  # a StreamingDataFrame instance

# Using a message key to filter incoming messages
# Note that the callback now must accept four positional arguments instead of one.
sdf = sdf.filter(lambda value, key, timestamp, headers: key != b'BAD_KEY', metadata=True)

This way, you may access metadata without additional imports.

Updating timestamps and headers with StreamingDataFrame.set_timestamp() and StreamingDataFrame.set_headers()

Docs:

Since version 2.6.0, you can update timestamps and message headers during processing using the StreamingDataFrame.set_timestamp() and StreamingDataFrame.set_headers() methods.
These methods accept callbacks similar to other operations, so you can use

The new timestamps will be used in windowed aggregations and when producing messages to the output topics using the StreamingDataFrame.to_topic() method.

The new headers will be set for the output messages as well.

Examples:

import time

sdf = app.dataframe(...)

# Update the timestamp to be the current epoch using the "set_timestamp" API.
# "set_timestamp()" requires the callback accepting four positional arguments: value, key, current timestamp, and headers. 
# The callback must return a new timestamp as integer in milliseconds.
sdf = sdf.set_timestamp(lambda value, key, timestamp, headers: int(time.time() * 1000))


# Add the value of APP_VERSION to the message headers for debugging purposes using the "set_headers()" API.  
# "set_headers()" also requires the callback accepting four positional arguments: value, key, timestamp, and current headers. 
# It must return a new set of headers as a list of (header, value) tuples.
# If incoming message doesn't have headers attached, the "headers" parameter will be None.
APP_VERSION = "v0.1.1"
sdf = sdf.set_headers(
    lambda value, key, timestamp, headers: [('APP_VERSION', APP_VERSION.encode())]
)

[NEW] New API to configure Kafka broker authentication

Docs:

In version 2.6.0, we introduced the new API to specify the Kafka broker credentials via the ConnectionConfig object.

Previously, when connecting to the brokers with any kind of authentication, users needed to provide the same connection settings as dictionaries and do it separately for both consumer and producer.

Now, the Kafka connection can be configured once using the ConnectionConfig object.
With ConnectionConfig you may specify additional authentication settings like SASL parameters and more, and pass it as a "broker_address" parameter to the Application.

Example:

Configure the application to connect to the Kafka broker with SASL authentication.

from quixstreams import Application
from quixstreams.kafka.configuration import ConnectionConfig

connection = ConnectionConfig(
    bootstrap_servers="my_url",
    security_protocol="sasl_plaintext",
    sasl_mechanism="PLAIN",
    sasl_username="my_user",
    sasl_password="my_pass"
)

app = Application(broker_address=connection)

Other Improvements

New Contributors

Full Changelog: v2.5.1...v2.6.0

v2.5.1

27 May 09:06
cf569eb
Compare
Choose a tag to compare

What's Changed

Fixes:

Docs updates

  • fix tutorial bullets for correct rendering in website version by @tim-quix in #362
  • Update README Community section by @stereosky in #368
  • update docs to include a local pattern, adjust SDK token language by @tim-quix in #369

New Contributors

Full Changelog: v2.5.0...v2.5.1

v2.5.0

16 May 16:31
29eb848
Compare
Choose a tag to compare

What's Changed

Features

Checkpointing

Checkpointing is an overhaul of the previous commit structure. It is meant to better synchronize processing progress (i.e. committing topic offsets) and state updates to ensure consistency of the state.

It should also increase processing speed anywhere from 1.3x-2.5x due to its new batched commit approach.

To adjust this new commit frequency, users can set a (new) commit_interval (Default: 5 seconds):

app = Application(commit_interval=5)

For more details, see the Checkpoint docs.

GroupBy

GroupBy enables users to "group" or "re-key" their messages based on the message value, typically to perform (stateful) aggregations on them (much like SQL).

With the new StreamingDataFrame.group_by(), you can do this while including other StreamingDataFrame operations before or after (so only one Application is needed):

# data: {"user_id": "abc", "int_field": 5}
app = Application()
sdf = app.dataframe()
sdf["new_col"] = sdf["int_field"] + 1
sdf = sdf.group_by("user_id")
sdf = sdf.apply(lambda r: r["new_col"])
sdf = sdf.tumbling_window(duration_ms=3600).sum().final()
# ...etc...

Users can group by a column name, or provide a custom grouping function.

For more details, see the GroupBy docs.

Enhancements

Full Changelog: v2.4.2...v2.5.0

v2.4.2

29 Apr 08:42
fb4b48f
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.4.1...v2.4.2

v2.4.1

04 Apr 17:48
0a17425
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.4.0...v2.4.1

v2.4.0

04 Apr 14:33
201b6e0
Compare
Choose a tag to compare

What's Changed

Features

Unified Application and Application.Quix() by @tim-quix in #313

In previous versions, to connect to Quix Cloud brokers you needed to use a separate factory method to create an Application - Application.Quix().
Due to that, developing apps with local Kafka and deploying them to Quix Cloud required additional code changes.
In this release, we improved that, and you now may use a single Application() class with different settings to connect both to Quix Cloud and standalone Kafka brokers.

See more about working with Quix Cloud in the docs - Connecting to Quix Cloud

Enhancements

Full Changelog: v2.3.3...v2.4.0

v2.3.3

25 Mar 19:17
75fba0c
Compare
Choose a tag to compare

What's Changed

  • Example Bugfix: Do not override state input on each update by @alexmorley in #310
  • Fix default replication_factor for topics created by Quix apps by @daniil-quix in #317

New Contributors

Full Changelog: v2.3.2...v2.3.3

v2.3.2

04 Mar 12:42
8242858
Compare
Choose a tag to compare

What's Changed

Features

  • Support GZIP compression of messages formatted in Quix formats by @peter-quix in #305

Fixes

  • Fix type hints and code completion by @tim-quix in #292
  • Allow processing of non-dict values in StreamingDataFrame by @harisbotic in #307
  • Skip processing of messages with None keys in windowed aggregations by @harisbotic in #306

Enhancements

  • Set the default value of consumer group to 'quixstreams-default' in Application by @harisbotic in #299
  • Set a default value for Quix__Portal__Api env variable in Quix apps by @harisbotic in #294
  • Raise exception on __bool__ checks in StreamingDataFrame by @tim-quix in #295
  • Don't set ssl.endpoint.identification.algorithm anymore for Quix apps by @harisbotic in #293
  • Update docs by @stereosky in #300

Full Changelog: v2.3.1...v2.3.2

v2.3.1

15 Feb 17:27
f9167c5
Compare
Choose a tag to compare

What's Changed

  • Added support for Changelog topics

    • Changelog topics provide fault tolerance capabilities to state stores.
      Each state store now has a corresponding changelog topic to keep track of the state updates in Kafka.
    • Changelog topics are enabled by default and can be disabled.
    • See more about changelog topics in the docs
  • Application.run() class now verifies that topics exist before starting the application.
    If topics don't exist, the Application instance will try to create them automatically if auto_create_topics is set to True (default).
    The topic parameters can also be specified, see more in the docs

  • This is the first non-alpha release of Quix Streams v2. It can now be installed from pip without the --pre flag.

Breaking changes

  • The partition assignment strategy is now always set to cooperative-sticky and cannot be configured anymore because the consumer relies on the incremental_assign() API for recovery.
    Previously, the assignment strategy was set to range by default, and range is a non-cooperative strategy.
    Since cooperative and non-cooperative (eager) strategies must not be mixed, all consumers in the group must first leave the group, and then join it again after upgrading the application to this version.

Full Changelog: v2.2.1a...v2.3.1