Releases: quixio/quix-streams
v2.7.0
Release 2.7.0
What's changed
[New] Support for Exactly-once processing guarantees using Kafka transactions.
With exactly-once processing guarantees enabled, each Kafka message is processed only one time and without duplicated outputs.
It is especially helpful when consistency of data in the output topics is crucial, and the downstream consumers don't handle duplicated data gracefully.
To learn more about the exactly-once processing and configuration, see the "Processing Guarantees" section here.
Other Improvements
- Removed
column_name
parameter from theDeserializer
class by @tim-quix in #392 - Update quickstart code and copy to match by @stereosky in #389
v2.6.0
Release 2.6.0
In this release we introduce new features as well as several breaking changes.
Please read the notes carefully before upgrading to 2.6.0.
What's changed
[BREAKING] The original Timestamps and Headers are now passed to the output when using StreamingDataFrame.to_topic()
Previously, StreamingDataFrame.to_topic()
used the current epoch when producing output messages were used, and headers were omitted.
Since version 2.6.0, Quix Streams passes the original timestamps and headers of the messages to the output topics for more consistent data processing.
This change affects the data in the output topics, therefore it is marked a breaking one.
If you want to keep the previous behavior, you may set the timestamp to the current epoch and drop message headers before producing the output message:
import time
output_topic = app.topic(...)
sdf = app.dataframe(...)
# Do some processing here ...
# Set the timestamp to the current epoch
sdf = sdf.set_timestamp(lambda value, key, timestamp, headers: int(time.time() * 1000))
# Set empty message headers
sdf = sdf.set_headers(lambda value, key, timestamp, headers: [])
# Producing message to the output topic
sdf = sdf.to_topic(output_topic)
[BREAKING] Window results timestamps are set to the window start by default
Since 2.6.0, the results of the windowed aggregations use the window start timestamp as a message timestamp.
You may adjust the timestamps using the new StreamingDataFrame.set_timestamp()
API.
[BREAKING] Removed key
and timestamp
attributes from the MessageContext
class
To access the message keys and timestamps, please use the new API described below
[BREAKING] final()
and current()
methods of Windows don't have the expand
parameter anymore
[NEW] New APIs to access and update message metadata in StreamingDataFrame
Accessing message metadata during processing
Docs:
Previously, the Kafka message metadata resided in a separate MessageContext
instance.
For example, to access a message key or a timestamp, users needed to import a quixstreams.message_context
function, which is not straightforward:
from quixstreams import message_context
sdf = app.dataframe(...)
# Previous way of getting a message key in versions < 2.6.0
sdf['message_key'] = sdf.apply(lambda value: message_context().key)
Now, the .apply()
, .filter()
, and .update()
methods of StreamingDataFrame
accept a new parameter - metadata=True
.
Passing metadata=True
to any of the functions above will inform StreamingDataFrame
to provide additional positional arguments with the message metadata to the callback.
Example:
from quixstreams import Application
sdf = app.dataframe(...) # a StreamingDataFrame instance
# Using a message key to filter incoming messages
# Note that the callback now must accept four positional arguments instead of one.
sdf = sdf.filter(lambda value, key, timestamp, headers: key != b'BAD_KEY', metadata=True)
This way, you may access metadata without additional imports.
Updating timestamps and headers with StreamingDataFrame.set_timestamp()
and StreamingDataFrame.set_headers()
Docs:
- https://quix.io/docs/quix-streams/processing.html#updating-kafka-timestamps
- https://quix.io/docs/quix-streams/processing.html#updating-kafka-headers
Since version 2.6.0, you can update timestamps and message headers during processing using the StreamingDataFrame.set_timestamp()
and StreamingDataFrame.set_headers()
methods.
These methods accept callbacks similar to other operations, so you can use
The new timestamps will be used in windowed aggregations and when producing messages to the output topics using the StreamingDataFrame.to_topic()
method.
The new headers will be set for the output messages as well.
Examples:
import time
sdf = app.dataframe(...)
# Update the timestamp to be the current epoch using the "set_timestamp" API.
# "set_timestamp()" requires the callback accepting four positional arguments: value, key, current timestamp, and headers.
# The callback must return a new timestamp as integer in milliseconds.
sdf = sdf.set_timestamp(lambda value, key, timestamp, headers: int(time.time() * 1000))
# Add the value of APP_VERSION to the message headers for debugging purposes using the "set_headers()" API.
# "set_headers()" also requires the callback accepting four positional arguments: value, key, timestamp, and current headers.
# It must return a new set of headers as a list of (header, value) tuples.
# If incoming message doesn't have headers attached, the "headers" parameter will be None.
APP_VERSION = "v0.1.1"
sdf = sdf.set_headers(
lambda value, key, timestamp, headers: [('APP_VERSION', APP_VERSION.encode())]
)
[NEW] New API to configure Kafka broker authentication
Docs:
In version 2.6.0, we introduced the new API to specify the Kafka broker credentials via the ConnectionConfig
object.
Previously, when connecting to the brokers with any kind of authentication, users needed to provide the same connection settings as dictionaries and do it separately for both consumer and producer.
Now, the Kafka connection can be configured once using the ConnectionConfig
object.
With ConnectionConfig
you may specify additional authentication settings like SASL parameters and more, and pass it as a "broker_address"
parameter to the Application
.
Example:
Configure the application to connect to the Kafka broker with SASL authentication.
from quixstreams import Application
from quixstreams.kafka.configuration import ConnectionConfig
connection = ConnectionConfig(
bootstrap_servers="my_url",
security_protocol="sasl_plaintext",
sasl_mechanism="PLAIN",
sasl_username="my_user",
sasl_password="my_pass"
)
app = Application(broker_address=connection)
Other Improvements
- Update requests and confluent_kafka dependencies by @daniil-quix in #379
- Bug: configuring broker_address take priority over quix config by @quentin-quix in #384
- Check partitions status after committing in checkpoint by @quentin-quix in #386
- Configure the producer flush timeout with the max.poll.interval.ms by @quentin-quix in #382
- Reduce the checkpointing log level by @daniil-quix in #377
- Fix flaky tests that rely on kafka in docker by @daniil-quix in #378
- Default logging configuration improvements by @quentin-quix in #383
New Contributors
- @quentin-quix made their first contribution in #383
Full Changelog: v2.5.1...v2.6.0
v2.5.1
What's Changed
Fixes:
- Correcting the Topic parameter definition by @shrutimantri in #364
- Don't set default topics params in Quix apps by @daniil-quix in #365
- Fix admin timeouts by @tim-quix in #342
- Correcting the error message to be more explicit by @shrutimantri in #348
- Fix Quix changelog topics failing first validation by @tim-quix in #366
- Update exception messages for TopicConfigurationMismatch by @tim-quix in #370
Docs updates
- fix tutorial bullets for correct rendering in website version by @tim-quix in #362
- Update README Community section by @stereosky in #368
- update docs to include a local pattern, adjust SDK token language by @tim-quix in #369
New Contributors
- @shrutimantri made their first contribution in #364
Full Changelog: v2.5.0...v2.5.1
v2.5.0
What's Changed
Features
Checkpointing
Checkpointing
is an overhaul of the previous commit structure. It is meant to better synchronize processing progress (i.e. committing topic offsets) and state updates to ensure consistency of the state.
It should also increase processing speed anywhere from 1.3x-2.5x due to its new batched commit approach.
To adjust this new commit frequency, users can set a (new) commit_interval
(Default: 5 seconds):
app = Application(commit_interval=5)
For more details, see the Checkpoint
docs.
GroupBy
GroupBy
enables users to "group" or "re-key" their messages based on the message value, typically to perform (stateful) aggregations on them (much like SQL).
With the new StreamingDataFrame.group_by()
, you can do this while including other StreamingDataFrame
operations before or after (so only one Application
is needed):
# data: {"user_id": "abc", "int_field": 5}
app = Application()
sdf = app.dataframe()
sdf["new_col"] = sdf["int_field"] + 1
sdf = sdf.group_by("user_id")
sdf = sdf.apply(lambda r: r["new_col"])
sdf = sdf.tumbling_window(duration_ms=3600).sum().final()
# ...etc...
Users can group by a column name, or provide a custom grouping function.
For more details, see the GroupBy
docs.
Enhancements
- Docs updates by @stereosky in #344, #352
- add default error cb to Admin by @tim-quix in #343
Full Changelog: v2.4.2...v2.5.0
v2.4.2
What's Changed
-
Fix handling of topics created outside of Quix Cloud by @tim-quix in #338
-
Add clearer error messages for invalid SDF column name references by @tim-quix in #322
-
Better handling of topic creation errors in Quix Cloud by @tim-quix in #337
-
Use pyproject.toml instead of setup.cfg by @tim-quix in #339
-
Update README by @stereosky #332 #335
Full Changelog: v2.4.1...v2.4.2
v2.4.1
What's Changed
- Fix 404s in README by @daniil-quix in #328
- hotfix bug around undefined workspace id for Quix API class by @tim-quix in #329
Full Changelog: v2.4.0...v2.4.1
v2.4.0
What's Changed
Features
Unified Application
and Application.Quix()
by @tim-quix in #313
In previous versions, to connect to Quix Cloud brokers you needed to use a separate factory method to create an Application - Application.Quix()
.
Due to that, developing apps with local Kafka and deploying them to Quix Cloud required additional code changes.
In this release, we improved that, and you now may use a single Application()
class with different settings to connect both to Quix Cloud and standalone Kafka brokers.
See more about working with Quix Cloud in the docs - Connecting to Quix Cloud
Enhancements
- Improved Quix API error handling by @tim-quix in #320
- Adjust producer flush docstring by @tim-quix in #321
- New docs by @daniil-quix in #315
- Update black formatter version by @daniil-quix in #316
Full Changelog: v2.3.3...v2.4.0
v2.3.3
What's Changed
- Example Bugfix: Do not override state input on each update by @alexmorley in #310
- Fix default replication_factor for topics created by Quix apps by @daniil-quix in #317
New Contributors
- @alexmorley made their first contribution in #310
Full Changelog: v2.3.2...v2.3.3
v2.3.2
What's Changed
Features
- Support GZIP compression of messages formatted in Quix formats by @peter-quix in #305
Fixes
- Fix type hints and code completion by @tim-quix in #292
- Allow processing of non-dict values in
StreamingDataFrame
by @harisbotic in #307 - Skip processing of messages with
None
keys in windowed aggregations by @harisbotic in #306
Enhancements
- Set the default value of consumer group to
'quixstreams-default'
inApplication
by @harisbotic in #299 - Set a default value for
Quix__Portal__Api
env variable in Quix apps by @harisbotic in #294 - Raise exception on
__bool__
checks inStreamingDataFrame
by @tim-quix in #295 - Don't set
ssl.endpoint.identification.algorithm
anymore for Quix apps by @harisbotic in #293 - Update docs by @stereosky in #300
Full Changelog: v2.3.1...v2.3.2
v2.3.1
What's Changed
-
Added support for Changelog topics
- Changelog topics provide fault tolerance capabilities to state stores.
Each state store now has a corresponding changelog topic to keep track of the state updates in Kafka. - Changelog topics are enabled by default and can be disabled.
- See more about changelog topics in the docs
- Changelog topics provide fault tolerance capabilities to state stores.
-
Application.run()
class now verifies that topics exist before starting the application.
If topics don't exist, theApplication
instance will try to create them automatically ifauto_create_topics
is set toTrue
(default).
The topic parameters can also be specified, see more in the docs -
This is the first non-alpha release of Quix Streams v2. It can now be installed from
pip
without the--pre
flag.
Breaking changes
- The partition assignment strategy is now always set to
cooperative-sticky
and cannot be configured anymore because the consumer relies on theincremental_assign()
API for recovery.
Previously, the assignment strategy was set torange
by default, andrange
is a non-cooperative strategy.
Since cooperative and non-cooperative (eager) strategies must not be mixed, all consumers in the group must first leave the group, and then join it again after upgrading the application to this version.
Full Changelog: v2.2.1a...v2.3.1