Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Reader.consume() supposed to throw on underflow? #311

Closed
bittrance opened this issue Dec 1, 2024 · 4 comments · Fixed by #313
Closed

Is Reader.consume() supposed to throw on underflow? #311

bittrance opened this issue Dec 1, 2024 · 4 comments · Fixed by #313

Comments

@bittrance
Copy link
Contributor

Reading

common.Throw(k.vu.Runtime(), err)
it seems that when consume({limit: 10}) reads less than 10 messages, it will throw an error, meaning we cannot inspect the messages that were actually received. There are scenarios where I don't know the exact number of messages that will arrive. This is particularly true when verifying/debugging messages over several partitions, since the client tends to return batches of messages form one partition at a time; to be sure to receive a representative sample, I need to read a largish number of messages.

However, the parameter is named limit which makes it sounds like reading less messages than the limit would be normal, so perhaps it is the intention that the context deadline should not be treated as an error case; that the current behavior is in fact a bug?

If the current behavior of erroring on underflow is intended behavior, I would like to contribute PR(s) that:

  1. Change limit to e.g. at_least or some similar name
  2. Add a behavior that returns whatever has been read when the deadline arrives.

If it is a bug, I'll PR a fix. How does that sound?

@bittrance bittrance changed the title Is Reader.consume() really supposed to throw on underflow? Is Reader.consume() supposed to throw on underflow? Dec 1, 2024
@mostafa
Copy link
Owner

mostafa commented Dec 1, 2024

Hey @bittrance,

Please read #287 and the associated issue to see if that explains why I added throw. Then we can discuss how we want to either add support for your specific use-case or just fix the existing implementation.

@bittrance
Copy link
Contributor Author

To give a little more detail, I'm trying to use xk6-kafka to evaluate Strimzi-operated Kafka. The killer feature is that k6 with xk6-kafka can give me both a reasonably performant producer/consumer pair, and can emit Prometheus metrics (via xk6-prometheus). I'm using chaostoolkit to construct scenarios killing brokers and injecting network partitions while it monitors the Prometheus metrics generated by k6 to ensure that the producer and consumer can continue to operate. In such scenarios, interaction with Kafka can legitimately pause for tens of seconds but then resume. I don't want to set maxWait to large numbers, because I do want to inspect the messages, e.g. to check timestamps and message ordering. In these scenarios, the deadline can be exceeded without cause for concern.

@bittrance
Copy link
Contributor Author

I had a look and I think the deadline case should be treated separately. TBH, I find the reasoning in #287 a little strange. The deadline is technically an error condition in that FetchMessage() does return Message{}, ctx.Err() but that is just a Go convention. If you care about what happened, you can check that the error is in fact a context.DeadlineExceeded and act accordingly.

As an aside, one indication that the current approach is problematic is that we will report stats saying we received messages that the caller of consume() will never see.

I created a PR here bittrance#1 to show what it would look like to change the current behavior. (Because it depends on #312 , I cannot create it against upstream without including those commits; once we have decided what to do with 312, I can create a proper PR against this repo.) I understand if you do not want to change the current default behavior; we could introduce a fail_on_timeout: false option or something to get at the alternative behavior.

@mostafa
Copy link
Owner

mostafa commented Dec 3, 2024

@bittrance Makes sense. Using a flag to keep backward compatibility will avoid breaking changes in other peoples' scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants