-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Cursors through API Query Params #14110
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14110 +/- ##
============================================
+ Coverage 61.75% 63.90% +2.14%
- Complexity 207 1608 +1401
============================================
Files 2436 2713 +277
Lines 133233 149658 +16425
Branches 20636 22908 +2272
============================================
+ Hits 82274 95634 +13360
- Misses 44911 47012 +2101
- Partials 6048 7012 +964
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
12a6250
to
d46760b
Compare
pinot-broker/src/main/java/org/apache/pinot/broker/api/resources/PinotClientRequest.java
Outdated
Show resolved
Hide resolved
@@ -100,6 +103,9 @@ | |||
public class PinotClientRequest { | |||
private static final Logger LOGGER = LoggerFactory.getLogger(PinotClientRequest.class); | |||
|
|||
@Inject | |||
PinotConfiguration _brokerConf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to inject the broker config here as it's already in the BaseBrokerRequestHandler
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no getter right now to get the config. I can add that. However is that a better option than injecting the config ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However is that a better option than injecting the config ?
Probably not. It doesn't make sense to add that into the interface and we don't want to depend on the base class.
private static final Logger LOGGER = LoggerFactory.getLogger(ResponseStoreResource.class); | ||
|
||
@Inject | ||
private PinotConfiguration _brokerConf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, you might not need to inject the broker config here.
} | ||
|
||
response.setResultTable( | ||
new ResultTable(resultTable.getDataSchema(), resultTable.getRows().subList(offset, sliceEnd))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean every time a read request will fetch the whole result table into memory first and then pick the page from the memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation does read the whole result table into memory. Does LinkedIn implementation support seeking within the result table ? Do you have suggestions on how can the interface be changed to also support seek ?
pinot-common/src/main/java/org/apache/pinot/common/response/CursorResponse.java
Show resolved
Hide resolved
x -> new InstanceInfo(x.getInstanceName(), x.getHostName(), Integer.parseInt(x.getPort())))); | ||
|
||
try { | ||
Map<String, List<CursorResponseNative>> brokerResponses = getAllQueryResults(brokers, Collections.emptyMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, it seems every controller will try to fetch all the results from all the brokers, is it kind of a waste of resources to do that? Could we distribute the cleanup workloads to controllers instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that only the lead controller will run the periodic tasks. Right now, the controller does not have any information about ResponseStores. Only the broker has information.
The major point to discuss - is it a requirement that a ResponseStore should be accessible from all nodes ? Then the default implementation of using the broker local filesystem has to be changed.
The current perioidc task and cleaner assumes that a only a specific broker knows about the responses it has stored.
pinot-broker/src/main/java/org/apache/pinot/broker/api/resources/PinotClientRequest.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/api/resources/PinotClientRequest.java
Outdated
Show resolved
Hide resolved
@ManualAuthorization | ||
public BrokerResponse getSqlQueryMetadata( | ||
@ApiParam(value = "Request ID of the query", required = true) @PathParam("requestId") String requestId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this @ManualAuthorization
means that anyone can get the metadata of any known request id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've messed up my git branches. I have the code in a branch. I'll merge the code.
@ManualAuthorization | ||
public void getSqlQueryResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we check here that only the user that opened the cursor can read from it? IICU that is one of the requirements in the design document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've messed up my git branches. I have the code in a branch. I'll merge the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've merged support for authorization. My initial plan was to store the user who submitted the query. However that will require a lot of API changes in access control classes. Instead the response store contains the list of tables queried and that is used to authorize. So two users who have access to the same tables can read each other's cursors. I have not added tests yet as I havent found the infra to test these cases.
pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/BaseBrokerStarter.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/broker/helix/BaseBrokerStarter.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/cursors/FsResponseStore.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/cursors/FsResponseStore.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/cursors/AbstractResponseStore.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/cursors/FsResponseStore.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/cursors/FsResponseStore.java
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/cursors/FsResponseStore.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review
pinot-broker/src/main/java/org/apache/pinot/broker/cursors/FsResponseStore.java
Show resolved
Hide resolved
URI dataFile = combinePath(queryDir, String.format(RESULT_TABLE_FILE_NAME_FORMAT, _fileExtension)); | ||
try { | ||
_responseSerde.serialize(resultTable, Files.newOutputStream(tempResultTableFile)); | ||
pinotFS.copyFromLocalFile(tempResultTableFile.toFile(), dataFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: couldn't you just use pinotFS.move
? In this specific case, that could be implemented very cheap in LocalPinotFS (although I can see it is not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wont work. e.g. in S3PinotFs
, move
eventually calls copyObject
which assumes that the src is already in S3. https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! is this class used for all PinotFs? I was thinking it was used for local fs 🤦
pinot-broker/src/main/java/org/apache/pinot/broker/cursors/FsResponseStore.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/cursors/FsResponseStore.java
Outdated
Show resolved
Hide resolved
Path tempResultTableFile = getTempPath(_localTempDir, "resultTable", requestId); | ||
URI dataFile = combinePath(queryDir, String.format(RESULT_TABLE_FILE_NAME_FORMAT, _fileExtension)); | ||
try { | ||
_responseSerde.serialize(resultTable, Files.newOutputStream(tempResultTableFile)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This output stream should also be closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug in deeper why this worked without a close
. Seems like Jackson may or may not decide to close the stream.
Method that can be used to serialize any Java value as JSON output, using output stream provided (using encoding JsonEncoding.UTF8.
Note: method does not close the underlying stream explicitly here; however, JsonFactor this mapper uses may choose to close the stream depending on its settings (by default, it will try to close it when JsonGenerator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a close.
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
doClean(cleanAtMs); | ||
} | ||
|
||
public void doClean(long currentTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this method declares brokerResponses
and responses
variables. The first is the list of cursors open on each broker, which the method iterates to then ask each broker to delete it if needed. responses
store the result of that second request to brokers.
It may be subjective, but the names are not very clear. Both are responses from browsers. I would appreciate to look for better names, like knownCursors
and cleanResponses
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
private static final String QUERY_RESULT_STORE = "%s://%s:%d/responseStore"; | ||
private static final String DELETE_QUERY_RESULT = "%s://%s:%d/responseStore/%s"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two constants are templates that are used to build the URI we are going to call to get/delete cursors. I don't think it is very useful to declare them as a constant 60-90 lines above the single place they are going to be used. Instead I would prefer to see the template directly where we use them, specially if they are going to be used only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want me to turn this into a constant within the fn. ? 60-90 is an arbitrary distance. I can move the fn. up. With editor shortcuts, it should not be hard to move between the definition and usage ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer this to be directly defined (as a literal or as a variable) in the method, but this is nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I searched for this regex in Intellij final String .* = ".*"
and I could not find instances where a literal is defined in a fn. There is one in a test. Do you have examples ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review
Co-authored-by: Yash Mayya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I only have some minor, mostly nitpicky comments around the tests 🙂
Thanks for your patience on this one!
pinot-broker/src/main/java/org/apache/pinot/broker/cursors/FsResponseStore.java
Outdated
Show resolved
Hide resolved
...tion-tests/src/test/java/org/apache/pinot/integration/tests/cursors/MemoryResponseStore.java
Outdated
Show resolved
Hide resolved
...tion-tests/src/test/java/org/apache/pinot/integration/tests/cursors/MemoryResponseStore.java
Outdated
Show resolved
Hide resolved
...ntegration-tests/src/test/java/org/apache/pinot/integration/tests/CursorIntegrationTest.java
Outdated
Show resolved
Hide resolved
...ntegration-tests/src/test/java/org/apache/pinot/integration/tests/CursorIntegrationTest.java
Outdated
Show resolved
Hide resolved
...ntegration-tests/src/test/java/org/apache/pinot/integration/tests/CursorIntegrationTest.java
Show resolved
Hide resolved
protected Object[][] getPageSizesAndQueryEngine() { | ||
return new Object[][]{ | ||
{false, 2}, {false, 3}, {false, 10}, {false, 0}, //0 trigger default behaviour | ||
{true, 2}, {true, 3}, {true, 10}, {true, 0} //0 trigger default behaviour | ||
}; | ||
} | ||
|
||
@DataProvider(name = "pageSizeAndQueryEngineProvider") | ||
public Object[][] pageSizeAndQueryEngineProvider() { | ||
return getPageSizesAndQueryEngine(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be a single method. Also the method name and actual returned values has the order flipped (query engine / page size).
...ntegration-tests/src/test/java/org/apache/pinot/integration/tests/CursorIntegrationTest.java
Outdated
Show resolved
Hide resolved
JsonNode exception = pinotResponse.get("exceptions").get(0); | ||
Assert.assertTrue(exception.get("message").asText().startsWith("QueryValidationError:")); | ||
Assert.assertEquals(exception.get("errorCode").asInt(), 700); | ||
Assert.assertTrue(pinotResponse.get("brokerId").asText().startsWith("Broker_")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can also verify that the get all response stores API returns 0 results in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Assert.assertNull(pinotResponse.get("resultTable"));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I meant something like this:
List<CursorResponseNative> requestIds = JsonUtils.stringToObject(
ClusterTest.sendGetRequest(getBrokerGetAllResponseStoresApiUrl(getBrokerBaseApiUrl()), getHeaders()),
new TypeReference<>() {
});
Assert.assertEquals(requestIds.size(), 0);
But it was just an optional suggestion anyway.
// Used in tests to trigger the delete instead of waiting for the wall clock to move to an appropriate time. | ||
public static final String CLEAN_AT_TIME = "response.store.cleaner.clean.at.ms"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you were talking about CursorIntegrationTest::testResponseStoreCleaner
here - doesn't that test only verify that at least one of the two response stores is deleted? In that case, why do we need this response.store.cleaner.clean.at.ms
internal configuration? Wouldn't a short response store expiration value in the broker combined with a short frequency for the controller periodic job do the trick considering we're using a 100 second timeout for the wait condition anyway?
Cursor support will allow Pinot clients to consume query results in smaller chunks. This feature allows clients to work with lesser resources esp. memory. Application logic is simpler with cursors. For example an app UI paginates through results in a table or a graph. Cursor support has been implemented using APIs.
Design Doc
Implementation for #13185
API
POST /query/sql
A new broker API parameter has been added to trigger pagination.
The API accepts the following new optional query parameters:
The response contains the following extra fields:
GET /resultStore/{requestId}/results
This is a broker API that can be used to iterate over the result set of a query submitted using the above API.
The API accepts the following query parameters:
GET /resultStore/{requestId}/
Returns the BrokerResponse metadata of the query.
GET /resultStore
Lists all the requestIds of all the query results available in the response store.
DELETE /resultStore/{requestId}/
Delete the results of a query.
The API accepts the following query parameters:
SPI
The PR implements a FileSystem ResponseStore and a JSON ResponseSerde.
The feature provides two SPIs to extend the feature to support other implementations:
Both SPIs use Java SPI and the default ServiceLoader to find implementation of the SPIs. All implementation should be annotated with AutoService to help generate files for discovering the implementations.
Configuration
ResponseStore
File Response Store
Miscellaneous
tags: feature, multi-stage, release-notes