-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Case: 100+ KafkaStreams threads on 3.5k+ topics/partitions with exactly_once_v2 guarantee #447
Comments
also, I will separately describe current state. configs
the logic that I mostly follow when selecting configs https://stackoverflow.com/a/74518032 errors handlers
errorsrandomly happens and producer can not recover
also sometimes partitions leaders change on brokers side
|
It does sound like a large scale deployment (and EOS will make it a little bit more tricky), however, given the simplicity of the application, I am quite confident that it's possible and would not call it a "bad idea".
About the "errors" -- for the first one, you exception handler should restart the StreamThread and thus you should get recovery? Given you 7 sec commit interval, it does not sound too bad with regard to retrying a bunch of data. Or is this overall behavior not good enough? -- Btw: what version are you using (both broker / KS): newer version got significant fixes to make EOS more robust. For the warn logs: I am not a broker expert, so not sure why the leader moved, but in general it's a "normal" thing and it seem the client did handle it well, refreshing it's metadata. It's no error but a warn log, so wondering why it bothers you? |
@mjsax thank you for your reply!
I expect yes, because of
it performs well until an error occurs. and after error producer is not able to continue producing. and error
we use 3.5 Kafka brokers (still on ZooKeeper) and Kafka Streams
yeah, I also understand that it should be normal thing, but there were a lot errors and long retries. but I believe because of long
what specific information would you like to know to be able to advice something? |
Thanks. -- When a thread is replace, we literally start a new one which creates a new producer and calls Guess we would need detailed Kafka Streams, producer, and broker logs to figure out what is happening. Not sure if the high TX-timeout you configures has anything to do with it 🤔. On the other hand, even with the large timeout, calling |
hey @mjsax @jolshan. here are application logs from all instances and also some metrics. app_logs_hosts1,2.zip |
Hey sorry for the delay. Catching up on the situation. I will try to answer the questions posed to me, but let me know if I missed something.
As for writing, there is no blocking writes on the same partition. We do block read_committed reads. Each partition has something called the LSO. This is the first offset in the log that contains an uncommitted transaction. Read-committed consumers can not read past the LSO, so if you have transaction A and transaction B in the log like this: (A message 1) (B message 1) (B message 2) (B message 3) (B commit) The LSO would be at A message 1 and even though B is committed, you would not be able to read it until A is also committed. (A message 1) (B message 1) (B message 2) (B message 3) (B commit) (C message 1) (A commit) At this point LSO is at C message 1 and A and B transactions are readable.
As for some of the followups I saw: As for the timeouts, I don't think it is good for your transactional timeout to be less than your delivery timeout. |
Thanks @jolshan
With EOSv1 (deprecated) there would be a producer per task, and a StreamThread might have multiple assigned tasks. For EOSv2, there is one producer per StreamThread. |
thank you for your replies @jolshan @mjsax! @jolshan as for as far as I understand, |
hi folks!
currently I'm working on KafkaStreams based app. the idea is to stream from one source topic to 3.5k+ topics using
processing.guarantee: exactly_once_v2
mode.source topic is an endpoint and a single source of truth for all incoming messages. messages are different with different content and the goal is to split those incoming messages by content and write to a specific topic. for instance,
type1
messages totype1-topic
,type2
totype2-topic
and so forth.details
2.5-3 millions
messages per second, average message size is860b
(max size is up to250kb
), which gives2-2.4gb
per second;questions
is this even a good idea?
I mean, I like the idea because of its simplicity and elegance. but I also feel that the implementation cannot be reliable and stable.
how Kafka handles multiple ongoing transactions at time on one specific partition?
I found a lot of articles with explanation how Kafka transaction works, but all of them are about one-thread stream with one ongoing transaction at a time. but I've not idea how
TransactionCoordinator
works when a bunch of threads run transactions on same set of topics/partitions. as instance,thread-1
performs transaction ontype1_topic-0, type2_topic-0, type3_topic-0
and at the same time there are a bunch of thread performing transactions on same topicstype1_topic-0, type2_topic-0, type3_topic-0
.are the partitions blocked for writing until the transaction ends? are other transactions waiting in line?
is it even possible to make this solution solid and durable?
The text was updated successfully, but these errors were encountered: