Releases: quixio/quix-streams
v3.9.0
What's Changed
💎Table-style printing of StreamingDataFrames
You can now examine the incoming data streams in a table-like format StreamingDataFrame.print_table()
feature.
For interactive terminals, it can print new rows row-by-row in a live mode with an artificial delay, allowing you to glance at the data stream easily.
For non-interactive environments (stdout, file, etc.) or if live=False
, it will print rows in batches as soon as the data is available to the application.
This is an experimental feature, so feel free to submit an issue with your feedback 👍
See the docs to learn more about StreamingDataFrame.print_table()
.
sdf = app.dataframe(...)
# some SDF transformations happening here ...
# Print last 5 records with metadata columns in live mode
sdf.print_table(size=5, title="My Stream", live=True)
# For wide datasets, limit columns to improve readability
sdf.print_table(
size=5,
title="My Stream",
columns=["id", "name", "value"],
column_widths={"name": 20}
)
# Live output:
My Stream
┏━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ _key ┃ _timestamp ┃ id ┃ name ┃ value ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ b'53fe8e4' │ 1738685136 │ 876 │ Charlie │ 42.5 │
│ b'91bde51' │ 1738685137 │ 11 │ Alice │ 18.3 │
│ b'6617dfe' │ 1738685138 │ 133 │ Bob │ 73.1 │
│ b'f47ac93' │ 1738685139 │ 244 │ David │ 55.7 │
│ b'038e524' │ 1738685140 │ 567 │ Eve │ 31.9 │
└────────────┴────────────┴────────┴──────────────────────┴─────────┘
By @gwaramadze in #740, #760
Bugfixes
-
⚠️ Fix default state dir for Quix Cloud apps by @gwaramadze in #759
Please note that the state may be recovered to a different directory when updating existing deployments in Quix Cloud ifstate_dir
is not set. -
[Issue #440] ignore errors in rmtree by @ulisesojeda in #753
-
Fix
QuixPortalApiService
failing in multiprocessing environment by @daniil-quix in #755
Docs
- Add missing "how to install" section for
PandasDataFrameSource
by @daniil-quix in #751
New Contributors
- @ulisesojeda made their first contribution in #753
Full Changelog: v3.8.1...v3.9.0
v3.8.1
What's Changed
-
New PandasDataFrameSource connector to stream data from pandas.DataFrames during development and debugging by @JotaBlanco and @daniil-quix in #748
-
Made logging of common Kafka ACL issues more helpful by providing potentially missing ACLs and topic names by @tim-quix in
#742
- Fix docs for MongoDBSink by @tim-quix in #746
- Bump mypy from 1.13.0 to 1.15.0 by @dependabot in #744
Full Changelog: v3.8.0...v3.8.1
v3.8.0
What's Changed
💎 Count-based windows
Count-based windows allow aggregating events based on their number instead of time.
They can be helpful when time is not relevant to the particular aggregation or when a large number of out-of-order events are expected in the data stream.
Count-based windows support the same aggregations as time-based windows, including .reduce()
and .collect()
.
Supported window types:
tumbling_count_window()
- slice incoming stream into fixed-sized batcheshopping_count_window()
- slice incoming stream into overlapping batches of a fixed size with a fixed step.sliding_count_window()
- same as to count-based hopping windows with a step of 1 (e.g., last 10 events in the stream)
Example:
from quixstreams import Application
app = Application(...)
sdf = app.dataframe(...)
sdf = (
# Define a count-based tumbling window of size 3
sdf.tumbling_count_window(count=3)
# Specify the "collect" aggregate function
.collect()
# Emit updates once the window is closed
.final()
)
# Expected output:
# {
# "value": [<event1>, <event2>, <event3>],
# "start": <min timestamp in the batch>,
# "end": <max timestamp in the batch>
# }
See the "Windowed Aggregations" docs page for more info.
By @quentin-quix in #736 #739
💎 New Connectors
💎 A callback to react to late messages in Windows
Time-based windows can now accept on_late
callbacks to react to late messages in the windows.
You can use this callback to customize the logging of such messages or to send them to some dead-letter queue, for example.
Example:
from typing import Any
from datetime import timedelta
from quixstreams import Application
app = Application(...)
sdf = app.dataframe(...)
def on_late(
value: Any, # Record value
key: Any, # Record key
timestamp_ms: int, # Record timestamp
late_by_ms: int, # How late the record is in milliseconds
start: int, # Start of the target window
end: int, # End of the target window
name: str, # Name of the window state store
topic: str, # Topic name
partition: int, # Topic partition
offset: int, # Message offset
) -> bool:
"""
Define a callback to react on late records coming into windowed aggregations.
Return `False` to suppress the default logging behavior.
"""
print(f"Late message is detected at the window {(start, end)}")
return False
# Define a 1-hour tumbling window and provide the "on_late" callback to it
sdf.tumbling_window(timedelta(hours=1), on_late=on_late)
# Start the application
if __name__ == '__main__':
app.run()
See more in the docs
by @daniil-quix in #701 #732
🦠 Bugfixes
- Do not process late messages in sliding windows by @gwaramadze in #728
Other Changes
- StreamingDataFrame.merge(): prep work by @daniil-quix in #725
- windows: extract base class for windows and window definitions by @quentin-quix in #730
- state: refactor collection store to not rely on timestamp by @quentin-quix in #734
Full Changelog: v3.7.0...v3.8.0
v3.7.0
What's Changed
[NEW] 💎 Collection-based windowed aggregations
A new window operation was added to gather all events in the window into batches - collect()
.
You can use it to perform aggregations requiring collections that cannot be expressed via the reduce()
approach, such as calculating medians.
This operation is optimized for collecting values and performs significantly better than using reduce()
to accumulate batches of data.
Example:
### Collect all events over a 10-minute tumbling window into a list. ###
from datetime import timedelta
from quixstreams import Application
app = Application(...)
sdf = app.dataframe(...)
sdf = (
# Define a tumbling window of 10 minutes
sdf.tumbling_window(timedelta(minutes=10))
# Collect events in the window into a list
.collect()
# Emit results only for closed windows
.final()
)
# Output:
# {
# 'start': <window start>,
# 'end': <window end>,
# 'value': [event1, event2, event3, ...] - list of all events in the window
# }
Docs - https://quix.io/docs/quix-streams/windowing.html#collect
By @gwaramadze in #688
Full Changelog: v3.6.1...v3.7.0
v3.6.1
What's Changed
⚠️ Fix the bug when creating a changelog topic set the cleanup.policy
for the source topic to compact
Only topics created on the fly and repartition topics were affected. The configuration of existing topics is intact.
Please check the cleanup.policy
for the topics used in the applications and adjust if necessary.
Introduced in v3.4.0
.
Fixed by @quentin-quix in #716
Other changes
- Influxdb3 Sink: add some functionality and QoL improvements by @tim-quix in #689
- Bump types-protobuf from 5.28.3.20241030 to 5.29.1.20241207 by @dependabot in #683
Full Changelog: v3.6.0...v3.6.1
v3.6.0
What's Changed
Main Changes
⚠️ Switch to "range"
assignor strategy from "cooperative-sticky"
Due to discovered issues with the "cooperative-sticky"
assignment strategy, commits made during the rebalancing phase were failing.
To avoid that, we changed the partition assignor to "range"
which doesn't have such issues.
Note that "range"
assignor is enforced for consumers used by Application
, but it can be overridden for consumers created via app.get_consumer()
API.
❗How to update:
Since "cooperative-sticky"
and "range"
strategies must not be mixed, all consumers in the group must first leave the group, and then rejoin it after upgrading the application to Quix Streams v3.6.0
.
For more details, see #705 and #712
Other Changes
- Source: background file downloads for FileSource by @tim-quix in #670
- Fix lateness warnings in Windows by @daniil-quix in #700
- mypy: make quixstreams.core.* pass type checks by @quentin-quix in #685
- mypy: ensure default are set in overloaded methods by @quentin-quix in #698
- mypy: make quixstreams.dataframe.* pass type checks by @quentin-quix in #695
Docs
- Update mkdocs.yml by @gwaramadze in #703
- Update Documentation by @github-actions in #696
- Update Documentation by @github-actions in #699
- Bump version to 3.6.0 by @daniil-quix in #711
Full Changelog: v3.5.0...v3.6.0
v3.5.0
What's Changed
Features
- Added Azure File Source and Azure File Sink by @tim-quix in #669 and #671
- Pydantic ImportString for oauth_cb in ConnectionConfig by @mkp-jansen in #680
Fixes
- Re-raise the exceptions from the platform API by @daniil-quix in #686
- mypy: make quixstreams.platforms.* pass type checks by @quentin-quix in #678
- BigQuery Sink: fix bug around dataset and table ids by @tim-quix in #691
Docs
- Cleanup Examples and Tutorials by @tim-quix in #675
- Rename docs files by @daniil-quix in #674
- mypy: make quixstreams.models.* pass type checks by @quentin-quix in #673
- fix broken doc refs by @tim-quix in #677
New Contributors
- @mkp-jansen made their first contribution in #680
Full Changelog: v3.4.0...v3.5.0
v3.4.0
What's Changed
Breaking changes💥
Prefix topic names with source__
for auto-generated source topics
By default, each Source provides a default topic by implementing the default_topic()
method.
"source__"
for better visibility across other topics in the cluster.
This doesn't apply when the topic is passed explicitly via app.dataframe(source, topic)
or app.add_source(source, topic)
.
After upgrading to 3.4.0, the existing Sources using default topics will look for the topic with the new name on restart and create it if
doesn't exist.
To keep using the existing topics, pass the pre-configured Topic
instance with the existing name and serialization config:
from quixstreams import Application
app = Application(...)
# Configure the topic instance to use it together with the Source
topic = app.topic("<existing topic name>", value_serializer=..., value_deserializer=..., key_serializer=..., key_deserializer=...)
source = SomeSource(...)
# To run Sources together with a StreamingDataFrame:
sdf = app.dataframe(source=source, topic=topic)
# or for running Sources stand-alone:
app.add_source(source=source, topic=topic)
by @daniil-quix in #651 #662
Features 🌱
- Amazon Kinesis Sink by @gwaramadze in #642 #649
- Amazon Kinesis Source by @tim-quix in #646
- Amazon S3 Sink by @gwaramadze in #654
- Amazon S3 Source by @tim-quix in #653
- PostgreSQL Sink by @tomas-quix in #641
- Redis Sink by @daniil-quix in #655
- Stateful sources API implementation by @quentin-quix in #615 #631
Improvements 💎
- On
app.stop()
, commit checkpoint before closing the consumer by @daniil-quix in #638 - Trigger
AdminClient.poll
on initialization by @daniil-quix in #661
Docs 📄
- Remove the list of supported connectors from the Connectors docs. by @daniil-quix in #664
Other
- CI: Implement mypy pre-commit check by @quentin-quix in #643
- Update pydantic requirement from <2.10,>=2.7 to >=2.7,<2.11 by @dependabot in #652
- mypy: make quixstreams.state.* pass type checks by @quentin-quix in #657
Full Changelog: v3.3.0...v3.4.0
v3.3.0
What's Changed
New Connectors for Google Cloud
In this release, 3 new connectors have been added:
- Google Cloud Pub/Sub Source by @tim-quix in #622
- Google Cloud Pub/Sub Sink by @gwaramadze in #616 , #626
- Google Cloud BigQuery Sink by @daniil-quix in #621, #627
To learn more about them, see the respective docs pages.
Other updates
- Conda drop Python 3.8 support by @gwaramadze in #629
- Remove connectors docs from the nav by @daniil-quix in #630
- Update Documentation by @github-actions in #617
- Update connectors docs by @daniil-quix in #625
Full Changelog: v3.2.1...v3.3.0
v3.2.1
What's Changed
This is a bugfix release downgrading confluent-kafka
to 2.4.0 because of the authentication issue introduced in 2.6.0.
Full Changelog: v3.2.0...v3.2.1