Releases · quixio/quix-streams

04 Jul 16:58

daniil-quix

v2.7.0

892535d

v2.7.0 Latest

Latest

Release 2.7.0

What's changed

[New] Support for Exactly-once processing guarantees using Kafka transactions.

With exactly-once processing guarantees enabled, each Kafka message is processed only one time and without duplicated outputs.

It is especially helpful when consistency of data in the output topics is crucial, and the downstream consumers don't handle duplicated data gracefully.

To learn more about the exactly-once processing and configuration, see the "Processing Guarantees" section here.

Other Improvements

Removed column_name parameter from the Deserializer class by @tim-quix in #392
Update quickstart code and copy to match by @stereosky in #389

Contributors

stereosky and tim-quix

Assets 2

19 Jun 11:43

daniil-quix

v2.6.0

7327cb1

v2.6.0

Release 2.6.0

In this release we introduce new features as well as several breaking changes.
Please read the notes carefully before upgrading to 2.6.0.

What's changed

[BREAKING] The original Timestamps and Headers are now passed to the output when using `StreamingDataFrame.to_topic()`

Previously, StreamingDataFrame.to_topic() used the current epoch when producing output messages were used, and headers were omitted.

Since version 2.6.0, Quix Streams passes the original timestamps and headers of the messages to the output topics for more consistent data processing.

This change affects the data in the output topics, therefore it is marked a breaking one.

If you want to keep the previous behavior, you may set the timestamp to the current epoch and drop message headers before producing the output message:

import time

output_topic = app.topic(...)

sdf = app.dataframe(...)
# Do some processing here ...

# Set the timestamp to the current epoch
sdf = sdf.set_timestamp(lambda value, key, timestamp, headers: int(time.time() * 1000))

# Set empty message headers
sdf = sdf.set_headers(lambda value, key, timestamp, headers: [])

# Producing message to the output topic
sdf = sdf.to_topic(output_topic)

[BREAKING] Window results timestamps are set to the window start by default

Since 2.6.0, the results of the windowed aggregations use the window start timestamp as a message timestamp.

You may adjust the timestamps using the new StreamingDataFrame.set_timestamp() API.

[BREAKING] Removed `key` and `timestamp` attributes from the `MessageContext` class

To access the message keys and timestamps, please use the new API described below

[BREAKING] `final()` and `current()` methods of Windows don't have the `expand` parameter anymore

[NEW] New APIs to access and update message metadata in `StreamingDataFrame`

Accessing message metadata during processing

Docs:

https://quix.io/docs/quix-streams/processing.html#accessing-kafka-keys-timestamps-and-headers

Previously, the Kafka message metadata resided in a separate MessageContext instance.
For example, to access a message key or a timestamp, users needed to import a quixstreams.message_context function, which is not straightforward:

from quixstreams import message_context

sdf = app.dataframe(...)
# Previous way of getting a message key in versions < 2.6.0
sdf['message_key'] = sdf.apply(lambda value: message_context().key)

Now, the .apply(), .filter(), and .update() methods of StreamingDataFrame accept a new parameter - metadata=True.
Passing metadata=True to any of the functions above will inform StreamingDataFrame to provide additional positional arguments with the message metadata to the callback.

Example:

from quixstreams import Application

sdf = app.dataframe(...)  # a StreamingDataFrame instance

# Using a message key to filter incoming messages
# Note that the callback now must accept four positional arguments instead of one.
sdf = sdf.filter(lambda value, key, timestamp, headers: key != b'BAD_KEY', metadata=True)

This way, you may access metadata without additional imports.

Updating timestamps and headers with `StreamingDataFrame.set_timestamp()` and `StreamingDataFrame.set_headers()`

Docs:

Since version 2.6.0, you can update timestamps and message headers during processing using the StreamingDataFrame.set_timestamp() and StreamingDataFrame.set_headers() methods.
These methods accept callbacks similar to other operations, so you can use

The new timestamps will be used in windowed aggregations and when producing messages to the output topics using the StreamingDataFrame.to_topic() method.

The new headers will be set for the output messages as well.

Examples:

import time

sdf = app.dataframe(...)

# Update the timestamp to be the current epoch using the "set_timestamp" API.
# "set_timestamp()" requires the callback accepting four positional arguments: value, key, current timestamp, and headers. 
# The callback must return a new timestamp as integer in milliseconds.
sdf = sdf.set_timestamp(lambda value, key, timestamp, headers: int(time.time() * 1000))


# Add the value of APP_VERSION to the message headers for debugging purposes using the "set_headers()" API.  
# "set_headers()" also requires the callback accepting four positional arguments: value, key, timestamp, and current headers. 
# It must return a new set of headers as a list of (header, value) tuples.
# If incoming message doesn't have headers attached, the "headers" parameter will be None.
APP_VERSION = "v0.1.1"
sdf = sdf.set_headers(
    lambda value, key, timestamp, headers: [('APP_VERSION', APP_VERSION.encode())]
)

[NEW] New API to configure Kafka broker authentication

Docs:

https://quix.io/docs/quix-streams/configuration.html#authentication

In version 2.6.0, we introduced the new API to specify the Kafka broker credentials via the ConnectionConfig object.

Previously, when connecting to the brokers with any kind of authentication, users needed to provide the same connection settings as dictionaries and do it separately for both consumer and producer.

Now, the Kafka connection can be configured once using the ConnectionConfig object.
With ConnectionConfig you may specify additional authentication settings like SASL parameters and more, and pass it as a "broker_address" parameter to the Application.

Example:

Configure the application to connect to the Kafka broker with SASL authentication.

from quixstreams import Application
from quixstreams.kafka.configuration import ConnectionConfig

connection = ConnectionConfig(
    bootstrap_servers="my_url",
    security_protocol="sasl_plaintext",
    sasl_mechanism="PLAIN",
    sasl_username="my_user",
    sasl_password="my_pass"
)

app = Application(broker_address=connection)

Other Improvements

Update requests and confluent_kafka dependencies by @daniil-quix in #379
Bug: configuring broker_address take priority over quix config by @quentin-quix in #384
Check partitions status after committing in checkpoint by @quentin-quix in #386
Configure the producer flush timeout with the max.poll.interval.ms by @quentin-quix in #382
Reduce the checkpointing log level by @daniil-quix in #377
Fix flaky tests that rely on kafka in docker by @daniil-quix in #378
Default logging configuration improvements by @quentin-quix in #383

New Contributors

@quentin-quix made their first contribution in #383

Full Changelog: v2.5.1...v2.6.0

Contributors

daniil-quix and quentin-quix

Assets 2

27 May 09:06

daniil-quix

v2.5.1

cf569eb

v2.5.1

What's Changed

Fixes:

Correcting the Topic parameter definition by @shrutimantri in #364
Don't set default topics params in Quix apps by @daniil-quix in #365
Fix admin timeouts by @tim-quix in #342
Correcting the error message to be more explicit by @shrutimantri in #348
Fix Quix changelog topics failing first validation by @tim-quix in #366
Update exception messages for TopicConfigurationMismatch by @tim-quix in #370

Docs updates

fix tutorial bullets for correct rendering in website version by @tim-quix in #362
Update README Community section by @stereosky in #368
update docs to include a local pattern, adjust SDK token language by @tim-quix in #369

New Contributors

@shrutimantri made their first contribution in #364

Full Changelog: v2.5.0...v2.5.1

Contributors

stereosky, shrutimantri, and 2 other contributors

Assets 2

16 May 16:31

tim-quix

v2.5.0

29eb848

v2.5.0

What's Changed

Features

Checkpointing

Checkpointing is an overhaul of the previous commit structure. It is meant to better synchronize processing progress (i.e. committing topic offsets) and state updates to ensure consistency of the state.

It should also increase processing speed anywhere from 1.3x-2.5x due to its new batched commit approach.

To adjust this new commit frequency, users can set a (new) commit_interval (Default: 5 seconds):

app = Application(commit_interval=5)

For more details, see the Checkpoint docs.

GroupBy

GroupBy enables users to "group" or "re-key" their messages based on the message value, typically to perform (stateful) aggregations on them (much like SQL).

With the new StreamingDataFrame.group_by(), you can do this while including other StreamingDataFrame operations before or after (so only one Application is needed):

# data: {"user_id": "abc", "int_field": 5}
app = Application()
sdf = app.dataframe()
sdf["new_col"] = sdf["int_field"] + 1
sdf = sdf.group_by("user_id")
sdf = sdf.apply(lambda r: r["new_col"])
sdf = sdf.tumbling_window(duration_ms=3600).sum().final()
# ...etc...

Users can group by a column name, or provide a custom grouping function.

For more details, see the GroupBy docs.

Enhancements

Docs updates by @stereosky in #344, #352
add default error cb to Admin by @tim-quix in #343

Full Changelog: v2.4.2...v2.5.0

Contributors

stereosky and tim-quix

Assets 2

29 Apr 08:42

daniil-quix

v2.4.2

fb4b48f

v2.4.2

What's Changed

Fix handling of topics created outside of Quix Cloud by @tim-quix in #338
Add clearer error messages for invalid SDF column name references by @tim-quix in #322
Better handling of topic creation errors in Quix Cloud by @tim-quix in #337
Use pyproject.toml instead of setup.cfg by @tim-quix in #339
Update docs by @tbedford #325 #331
Update README by @stereosky #332 #335

Full Changelog: v2.4.1...v2.4.2

Contributors

stereosky, tbedford, and tim-quix

Assets 2

04 Apr 17:48

daniil-quix

v2.4.1

0a17425

v2.4.1

What's Changed

Fix 404s in README by @daniil-quix in #328
hotfix bug around undefined workspace id for Quix API class by @tim-quix in #329

Full Changelog: v2.4.0...v2.4.1

Contributors

daniil-quix and tim-quix

Assets 2

04 Apr 14:33

daniil-quix

v2.4.0

201b6e0

v2.4.0

What's Changed

Features

Unified `Application` and `Application.Quix()` by @tim-quix in #313

In previous versions, to connect to Quix Cloud brokers you needed to use a separate factory method to create an Application - Application.Quix().
Due to that, developing apps with local Kafka and deploying them to Quix Cloud required additional code changes.
In this release, we improved that, and you now may use a single Application() class with different settings to connect both to Quix Cloud and standalone Kafka brokers.

See more about working with Quix Cloud in the docs - Connecting to Quix Cloud

Enhancements

Improved Quix API error handling by @tim-quix in #320
Adjust producer flush docstring by @tim-quix in #321
New docs by @daniil-quix in #315
Update black formatter version by @daniil-quix in #316

Full Changelog: v2.3.3...v2.4.0

Contributors

daniil-quix and tim-quix

Assets 2

25 Mar 19:17

daniil-quix

v2.3.3

75fba0c

v2.3.3

What's Changed

Example Bugfix: Do not override state input on each update by @alexmorley in #310
Fix default replication_factor for topics created by Quix apps by @daniil-quix in #317

New Contributors

@alexmorley made their first contribution in #310

Full Changelog: v2.3.2...v2.3.3

Contributors

alexmorley and daniil-quix

Assets 2

04 Mar 12:42

daniil-quix

v2.3.2

8242858

v2.3.2

What's Changed

Features

Support GZIP compression of messages formatted in Quix formats by @peter-quix in #305

Fixes

Fix type hints and code completion by @tim-quix in #292
Allow processing of non-dict values in StreamingDataFrame by @harisbotic in #307
Skip processing of messages with None keys in windowed aggregations by @harisbotic in #306

Enhancements

Set the default value of consumer group to 'quixstreams-default' in Application by @harisbotic in #299
Set a default value for Quix__Portal__Api env variable in Quix apps by @harisbotic in #294
Raise exception on __bool__ checks in StreamingDataFrame by @tim-quix in #295
Don't set ssl.endpoint.identification.algorithm anymore for Quix apps by @harisbotic in #293
Update docs by @stereosky in #300

Full Changelog: v2.3.1...v2.3.2

Contributors

stereosky, harisbotic, and 2 other contributors

Assets 2

15 Feb 17:27

daniil-quix

v2.3.1

f9167c5

v2.3.1

What's Changed

Added support for Changelog topics
- Changelog topics provide fault tolerance capabilities to state stores.
  Each state store now has a corresponding changelog topic to keep track of the state updates in Kafka.
- Changelog topics are enabled by default and can be disabled.
- See more about changelog topics in the docs
Application.run() class now verifies that topics exist before starting the application.
If topics don't exist, the Application instance will try to create them automatically if auto_create_topics is set to True (default).
The topic parameters can also be specified, see more in the docs
This is the first non-alpha release of Quix Streams v2. It can now be installed from pip without the --pre flag.

Breaking changes

The partition assignment strategy is now always set to cooperative-sticky and cannot be configured anymore because the consumer relies on the incremental_assign() API for recovery.
Previously, the assignment strategy was set to range by default, and range is a non-cooperative strategy.
Since cooperative and non-cooperative (eager) strategies must not be mixed, all consumers in the group must first leave the group, and then join it again after upgrading the application to this version.

Full Changelog: v2.2.1a...v2.3.1

Assets 2

Releases: quixio/quix-streams

v2.7.0

Release 2.7.0

What's changed

[New] Support for Exactly-once processing guarantees using Kafka transactions.

Other Improvements

Contributors

v2.6.0

Release 2.6.0

What's changed

[BREAKING] The original Timestamps and Headers are now passed to the output when using StreamingDataFrame.to_topic()

[BREAKING] Window results timestamps are set to the window start by default

[BREAKING] Removed key and timestamp attributes from the MessageContext class

[BREAKING] final() and current() methods of Windows don't have the expand parameter anymore

[NEW] New APIs to access and update message metadata in StreamingDataFrame

Accessing message metadata during processing

Updating timestamps and headers with StreamingDataFrame.set_timestamp() and StreamingDataFrame.set_headers()

[NEW] New API to configure Kafka broker authentication

Other Improvements

New Contributors

Contributors

v2.5.1

What's Changed

New Contributors

Contributors

v2.5.0

What's Changed

Features

Checkpointing

GroupBy

Enhancements

Contributors

v2.4.2

What's Changed

Contributors

v2.4.1

What's Changed

Contributors

v2.4.0

What's Changed

Features

Unified Application and Application.Quix() by @tim-quix in #313

Enhancements

Contributors

v2.3.3

What's Changed

New Contributors

Contributors

v2.3.2

What's Changed

Features

Fixes

Enhancements

Contributors

v2.3.1

What's Changed

Breaking changes

[BREAKING] The original Timestamps and Headers are now passed to the output when using `StreamingDataFrame.to_topic()`

[BREAKING] Removed `key` and `timestamp` attributes from the `MessageContext` class

[BREAKING] `final()` and `current()` methods of Windows don't have the `expand` parameter anymore

[NEW] New APIs to access and update message metadata in `StreamingDataFrame`

Updating timestamps and headers with `StreamingDataFrame.set_timestamp()` and `StreamingDataFrame.set_headers()`

Unified `Application` and `Application.Quix()` by @tim-quix in #313