-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add partial param v1 #25126
base: main
Are you sure you want to change the base?
feat: add partial param v1 #25126
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not quite right. I have tried out the OSS version to see the behaviour of the partial
parameter, and here is what I get:
curl -G 'http://localhost:8086/query?db=mydb&chunked=true&chunk_size=2' --data-urlencode 'q=SELECT * FROM "mymeas"'
{"results":[{"statement_id":0,"series":[{"name":"mymeas","columns":["time","myfield","mytag"],"values":[["2016-05-19T18:37:55Z",90,"1"],["2016-05-19T18:37:56Z",90,"1"]],"partial":true}],"partial":true}]}
{"results":[{"statement_id":0,"series":[{"name":"mymeas","columns":["time","myfield","mytag"],"values":[["2016-05-19T18:37:57Z",90,"1"],["2016-05-19T18:37:58Z",90,"1"]]}]}]}
You can see "partial": true
only appears on the first line. It indicates whether there are more data points for that series, or statement, respectively (but since we only will support single statements, I think we would only ever see it appear in both places, or neither).
It does not appear in the second line because there are no more data points after that line.
In some of your tests, it is working as intended, but in others, you can see that it is appearing on lines where it should not, i.e., lines after which there are no more data points for that series. This might be because you are using the can_flush
function to determine the value. I believe that function just indicates if it is ready to flush data to the output stream, based on data in the buffer, the chunk size specified, etc.
This requirement may be difficult to satisfy, the closest I can think of is by using a subset of the check that can_flush
does, by just checking if the ChunkBuffer
's series
only has a single entry: that would indicate that it is still working on the current series, but I am not sure that would produce the desired result in all cases.
@hiltontj Thanks for your feedback. Do you have an example in the tests where the behavior is not as expected? From what I see, the partial is set to true for the first line but not for the second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeanArhancet - should have done this in my original review, but here are comments in line for each test, based on my understanding of how the partial
response parameter should work.
], | ||
"partial": true | ||
} | ||
], | ||
"statement_id": 0 | ||
"statement_id": 0, | ||
"partial": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test case, it worked correctly ✅
influxdb3/tests/server/query.rs
Outdated
], | ||
"partial": true | ||
} | ||
], | ||
"statement_id": 0 | ||
"statement_id": 0, | ||
"partial": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test case it did not work correctly, because there are no more results for the cpu
measurement. I think in this case, the second partial: true
should be there, i.e., for the statement, but not the first, i.e., for the measurement.
], | ||
"partial": true | ||
} | ||
], | ||
"statement_id": 0 | ||
"statement_id": 0, | ||
"partial": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is correct ✅
influxdb3/tests/server/query.rs
Outdated
], | ||
"partial": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is not correct, as there are no more records for cpu
.
"statement_id": 0, | ||
"partial": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is correct, as there are more records for the statement.
], | ||
"partial": true | ||
} | ||
], | ||
"statement_id": 0 | ||
"statement_id": 0, | ||
"partial": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these are incorrect, as there are no more records for the measurement or statement.
@hiltontj I believe I've fixed the issue. Could you please review the changes and let me know your thoughts? |
081166a
to
7e610e7
Compare
@JeanArhancet - it looks like you've made it so the existing test exhibits the correct behaviour, but I am not convinced that this will work properly for all cases. The reason for my concern is that in this PR, the I wonder if using a |
Thanks for the clarification. I'm not entirely sure I understand correctly. Are both |
I think they are correct with respect to the data that has been buffered, but because they don't factor in what is still in the The current tests are at the integration level and don't really give you a lot of control over the underlying
Yes, what I was hinting at would involve doing this. I think trying to set up a unit test where you can control the |
Re: the unit test suggestion, the Edit: then you could adapt that into a |
@hiltontj I've added a unit test in the latest commit. The test fails because the latest response contains the partial flag set to true, but it should be set to none since it's the last response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeanArhancet - this is a great start, I left a comment in-line.
influxdb3_server/src/http/v1.rs
Outdated
async fn test_partial_flag() { | ||
let batch = create_test_record_batch(); | ||
let schema = batch.schema(); | ||
let input_stream = stream::iter(vec![Ok(batch.clone())]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your input stream should have more than one record batch, as that will test the behaviour of the partial parameter when there are batches remaining in the stream, yet to be seen with poll_next
.
If you can augment your create_test_record_batch
method to generate batches for different measurements and with different sizes, then you could come up with and test different kinds of input streams, e.g.,
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ RecordBatch0 │ │ RecordBatch1 │ │ RecordBatch2 │
│ meas: 'cpu' │ │ meas: 'cpu' │ │ meas: 'mem' │
│ rows: 2 │ │ rows: 3 │ │ rows: 2 │
└──────────────┘, └──────────────┘, └──────────────┘
With the chunk size of 2, this should give the following chunks:
┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐
│ Chunk0 │ │ Chunk1 │ │ Chunk2 │ │ Chunk3 │
│ meas: 'cpu' │ │ meas: 'cpu' │ │ meas: 'cpu' │ │ meas: 'mem' │
│ rows: 2 │ │ rows: 2 │ │ rows: 1 │ │ rows: 2 │
│ partial(meas): true │ │ partial(meas): true │ │ partial(meas): false │ │ partial(meas): false │
│ partial(stmt): true │ │ partial(stmt): true │ │ partial(stmt): true │ │ partial(stmt): false │
└──────────────────────┘, └──────────────────────┘, └──────────────────────┘, └──────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion. I have updated the test to align with that. With this new test, we can see that my implementation is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that in order to make this work, the streaming/buffering logic needs to be refactored/rethought a bit. It would be great if you are willing to take a stab at it!
I updated the original issue description to make a note of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, perhaps peekable
is not the way to go. I wonder if the streaming logic, i.e., in the impl Stream for QueryResponseStream
just needs to be re-ordered.
Right now, it always checks if it can flush before polling the input stream, which means that it could flush the entire buffer before polling the input stream again, and which seems to be the cause of the problem we are up against here.
If, instead, we just polled the input before flushing, and then, when flushing, always leave at most one chunk in the buffer. That way, there is always enough data in the buffer, when streaming chunks, to decide what partial
should be.
The exception would be, when the input stream is done, and there are no more record batches, then you would want to flush the entire buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hiltontj Thanks for your feedback and description, but I'm a bit lost concerning the correct implementation and what I need to refactor. I don't think I will find the right approach 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hiltontj I've just pushed another unit test to verify the existence of the partial flag when the stream size is 1
c4f30e8
to
6ad9462
Compare
c30d160
to
ab971ce
Compare
ab971ce
to
53a1bb6
Compare
Closes #25014
This pull request includes the following changes:
Check-list: