PolarStreams compared to NATS JetStream #104
Replies: 3 comments 10 replies
-
@pavelnikolov nice to meet you! Thanks for posting the question here. I'm familiar with NATS core and I think it's a great lightweight solution for at-most-once delivery use cases. I have no experience with NATS Jetstream so my response comes from what I was able to test locally today and read in their docs. These are main differences I can see between PolarStreams and NATS Jetstream: OrderingIn Jetstream messages are published by 'publisher' (producer instance), with multiple producers there's no way to guarantee ordering. Having multiple producer instances is fairly common, i.e. multiple service instances producing messages. The following use case comes to mind: Account A is created, the event is produced by service instance 1; after some time (seconds, minutes) Account A is deleted, the event is produced by service instance 2; is there any guarantee that deletion event will be received by consumers after (and only after) the creation event? Similar to Kafka, PolarStreams guarantees strict ordering of events within the same partition key. Performance & Resource UsageNATS Core has a nice performance for non-durable events. On the other hand, Jetstream throughput of durable events (replication=3) seems to be not very good even with more H/W resources than PolarStreams. On the other hand, nats server uses resources in proportion to the load so you would have to model the hardware requirements (memory assigned to each pod) based on the load you expect:
PolarStreams uses a bounded amount of memory per topic and consumer group, allocating buffers in advance and reusing them. When doing capacity planning, this simplifies the task greatly: no matter the number of producers and consumers, or the load, we will get the same memory consumption. Furthermore, NATS Jetstream uses regular I/O which makes it for a "bad Kubernetes neighbour" as it will pollute the page cache. The Linux page cache is a shared resource in K8s at node level. Using the page cache extensively also makes resource/capacity planning very hard as the page data is also included in the Working Set Size but we can't control it. PolarStreams uses Direct I/O and a series of techniques that makes it lightweight and fast. APIJetstreams was created as a persistence layer on top of NATS which can be a good way to leverage the existing NATS ecosystem but it's not realistic to say that things that work on NATS (with at-most-once guarantees) will continue working with Jetstream (at-least-once), specially considering multi-producer/multi-consumers. As far as I can tell, there are several knobs and settings we would have to touch on different parts to make it work as we want (e.g. "use pull if you want to scale consumers"), making it very hard for new users not to shoot themselves in the foot by applying a different pattern (and thinking they are providing a guarantee that in production they are not). In contrast, PolarStreams provides a REST API where producing is a simple call (the user can be sure is durably stored) and consuming requires setting the "group" it belongs to and that's it. I personally found some NATS concepts hard to grasp, "source, subjects, streams, consumers, publishers, ... " as opposed to Kafka/PolarStreams' "topics, producers and consumers", but maybe that's just me :) |
Beta Was this translation helpful? Give feedback.
-
Thanks @pavelnikolov for posing the question. @jorgebay I am looking forward to learning more about this project, but wanted to address a few of the points you made about JetStream. As a general preface, it is very common when people compare a Kafka-like system to NATS, some of the concepts don't translate. They have fundamentally two different origin stories, so that is to be expected and the design decisions within JetStream are not attempting to copy those of Kafka. Ordering
As you noted, given multiple clients publishing messages being received and written to the same stream, since they are concurrent, by default, order will be dictated by how they are serialized by the server. This is true for any concurrent writer situation unless you declare an expectation of ordering. In NATS, when a stream is created, it logically acts as a service that binds one or more subjects, e.g. At publish time, there are opt-in optimistic-concurrency control options you can provide in the form of headers on the published message:
This means that a client publishing to that stream can include one of those headers to ensure the stream or subject sequence has not changed in the meantime. In general, the subject-level sequence check is preferred to provide more granular OCC control. But this is how you can solve the concurrent publish-time ordering concern. One additional contrasting point is that with a NATS stream, there are no partitions by default, everything is multiplexed onto the same stream. Any number of consumers (single workers or queue groups) can be created and does not have any constraints based on the number of partitions (since there aren't any). There is a pattern to introducing deterministic partitioning using subject mapping which, again, is opt-in for those that need it. Performance & Resource UsageThese are good call-outs and favorable design decisions that Barco has optimized for up front. With JetStream, the extreme performance/scale as Kafka/Redpanda/Barco can achieve today given the use cases they focus on has not been as high of a priority. NATS is used for a spectrum of use cases which the Kafka-like systems are not well suited for. NATS is much more focused on being a connective technology which manifests in properties like being able to be deployed on edge devices and connected into a supercluster that can span the globe with full location transparency, etc. That said, we do often get compared to Kafka and/or asked if we can replace a Kafka setup since NATS was one of the early projects that provided a self-contained, zero dependency binary that can be deployed anywhere. For use cases that need persistence but not extreme scale out (today), we are a great solution and it drops a huge dependency and cost of additional infrastructure especially if they are using core NATS for messaging. Likewise we have a key-value layer built on top of the stream layer which folks are adopting as a lightweight alternative for basic Redis KV. Recent NATS versions are capable of reaching around 300k messages per second throughput, for a single stream with a replication factor of 3. This would be comparable to a single Kafka partition, since both are totally ordered. I don’t want to get into the weeds of benchmarks since that is a nuanced topic, but just calling it out since the SO post is quite old in terms of how quickly JetStream has been evolving. Regarding k8s, NATS had not been initially optimized/focused on k8s since the server had been written over a decade ago. That said, we have many users deploying NATS into k8s environments and quite a bit of work has gone into tuning and documenting the necessary resource limits and Go runtime hints (e.g. GOMEMLIMIT) to optimize the performance in this environment. But the points you call out are useful for further optimization. The bottom line is that each system makes different trade-offs, but it boils down to whether a technology fits the needs of the user/use case. NATS has a very feature-rich CLI that makes it straightforward to test virtually every feature of the server as well as benchmark. API
I would summarize this comment as "the API may not be as intuitive as it could be" which is fair criticism. The NATS team is aware that some things can be simplified, the defaults can be improved and less client-side magic can occur. However, this learning curve is fairly short-lived and once people start building for production, the set of knobs can be useful. Virtually no configuration needs to be set by default for streams or consumers, so it is all opt-in. The feature set has been driven largely by user requests to satisfy certain use cases.
Calling out a REST API vs the NATS protocol is not a useful comparison here and guarantees around durability of a stored message is completely orthogonal to the protocol. A client receiving a message published to a stream receives an acknowledgement from the server once it is persisted.
Without looking at the Barco consumer API in detail, creating consumers with options to control replicas, inactivity thresholds, the persistence medium, etc, is very straightforward with the client SDKs. As noted above, we are aware of more simplifications for pure beginners, but it is not a huge hurdle once you get the concepts.
A natural bias when you are familiar/comfortable with one technology vs. another 😄 |
Beta Was this translation helpful? Give feedback.
-
This is not quite right. Messages can be published from any number of publishers and received by a stream in the sever. By default the messages are ordered as they are received, but as noted OCC can be applied to ensure ordering at the stream level or subject (per key) has the expected order. On the consumption side, by default a consumer can be created on a stream and will receive all messages in order. This can be for all messages in a stream or using a subject filter, e.g. Given either one of these consumers, if it you pull messages (fetching batches at a time), you effectively have control over the number of messages in flight for that consumer you are responsible for consumer and acking. While processing this batch in order, you ack each message and if an ack fails, you can either retry or bail out consuming (unsubscribe). On resubscribe, the consumer will simply restart from the last unack'ed message and deliver in order from there (effectively rewind). There is also another setting which can be set on the consumer called "max-ack-pending" to one and then only one message for that consumer will be in-flight at any given time. Often when people get surprised by out-of-order messages is when they are using a push consumer (which the server proactively pushes messages into the client's buffer) and they forget to ack messages. But like I said above, with a pull consumer this is easier to control the flow and desired behavior.
Just to reinforce, there are no separate message sequences. All messages across subjects bound to a stream are written in order in the persistence layer and sequenced together (1, 2, 3, etc.). So from the consumer side, it just one stream where each message could have a different concrete subject, but they have total order.
Thanks! Yes it is a nice project. We are always learning and improving and looking forward to seeing what we can learn from Barco. |
Beta Was this translation helpful? Give feedback.
-
This is an awesome project. I heard about it during KCD Spain - thank you for the awesome presentation!
To me it seems that PolarStreams is designed to achieve the same goal as NATS JetStream. I've been using event-based microservices for a while and I'm really interested in the differences between the two projects. It is true that NATS was designed with pub/sub in mind, however included in the same binary is JetStream using the
-js
command line flag (e.g.nats-server -js
). NATS JetStream allows storing event streams on disk. It is already distributed, no brokers required, extremely lightweight, easy to deploy to Kubernetes, multi-tenanted, supports security using TLS and JWT tokens. I'm not familiar with PolarStreams but NATS JetStream seems to cover all the event streaming needs. I would like to know why would I choose PolarStreams instead of NATS JetStream.EDIT (jorge): edited to reflect the name change from Barco to PolarStreams.
Beta Was this translation helpful? Give feedback.
All reactions