-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement basic FlightSQL Server #1386
Conversation
Change-Id: I108b2468b078470bb8b6f95c031035cc09227986
Thank you @wangfenjin -- I will try and give this a review over the next few days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wangfenjin --- I think this looks like a great start. 🏅
The only thing I think is really needed prior to initial merge is:
- An integration test demonstrating the end to end flow of sending / receiving messages
- Examples and Documentation (perhaps the tests would suffice initially?)
Follow on items (I can file follow on tickets for these)
- Create a flight sql client to simplify interacting with a flight sql server (something like this would likely be needed for the tests); This could follow the same model of wrapping the underlying tonic client.
- Fill out more of the implementations (I think it is ok to merge the initial APIs and then implement them in subsequent PRs -- maybe others in the community would be interested in helping too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope to have time for a more in depth review later in the week, but don't feel you need to wait on this.
I do feel that bringing in a second protobuf library for decoding the any payloads is something that could cause immense confusion down the line. I think if there isn't a particular reason not to use prost, it would be better to be consistent. We use prost in IOx for any payloads, if that could help as a reference...
Otherwise, nice work, glad to see this coming together 👍
Thank for @alamb and @tustvold for your review, I'll check all the comments and update the code later, response to some of your general comment:
|
This is awesome! Thank you =) |
I actually think both types of examples are useful, but for different purposes:
The example in arrow can be done as a follow on PR (maybe someone else will do it) -- I'll plan to file tickets for follow on work after your initial PR Thanks again @wangfenjin |
Change-Id: Ibb381e105041b38e6402850a2338403f802568ec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change-Id: I9485e510f1a960b6e094e559c3679434f8474ec1
Change-Id: I7ef4ade3acc81ccf5df088c866d41b538cf6f4f2
Codecov Report
@@ Coverage Diff @@
## master #1386 +/- ##
==========================================
- Coverage 83.17% 82.67% -0.51%
==========================================
Files 182 185 +3
Lines 53439 53764 +325
==========================================
Hits 44449 44449
- Misses 8990 9315 +325
Continue to review full report at Codecov.
|
Change-Id: I35d108ef43f2c2245444cfd5ea82da00b4f694f9
#[prost(int64, tag = "1")] | ||
pub record_count: i64, | ||
} | ||
/// Options for CommandGetSqlInfo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is awesome that all these comments (from the protobuf) got kept
Thanks @wangfenjin -- I will try and review this carefully tomorrow |
Change-Id: Ic159cea2c76b017e183d2946e2d24e6fd1f9b4c1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, a couple of nits along with what I think is causing the very bizzarre compiler errors. Nice work 🥇
fn as_any(&self) -> prost_types::Any; | ||
} | ||
|
||
macro_rules! prost_message_ext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👌
arrow-flight/src/sql/mod.rs
Outdated
M::type_url() == self.type_url | ||
} | ||
|
||
fn unpack<M: ProstMessageExt>(&self) -> ArrowResult<Option<M::Item>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could return tonic::Status
given the fact it will be predominantely used in gRPC handlers? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most public API in this lib will return ArrowResult, I think it's better to keep it consistent
arrow-flight/src/sql/mod.rs
Outdated
pub mod server; | ||
|
||
/// ProstMessageExt are useful utility methods for prost::Message types | ||
pub trait ProstMessageExt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should at least be pub(crate)
, it feels like something best kept as an implementation detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to use it in service implementation like here https://github.com/wangfenjin/arrow-datafusion/pull/1/files#diff-d942c264020a5d47b87deaca1b1064b53f3819a8f90764fad8fa3c2b9ccf6225R81
|
||
/// ProstAnyExt are useful utility methods for prost_types::Any | ||
/// The API design is inspired by [rust-protobuf](https://github.com/stepancheg/rust-protobuf/blob/master/protobuf/src/well_known_types_util/any.rs) | ||
pub trait ProstAnyExt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should possibly also be crate local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as ProstMessageExt
Change-Id: I709c16613092fd42ccff827eed3e3ad3f28368e2
Change-Id: I03ed0f69ddb1203ecd75982815fa72eca4d81160
Change-Id: Ia35d697aaac3c72feba9c3aaf380ee3930484c48
Change-Id: I6006702d424ac6595f58c66057df267c4fd24476
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is looking great. Thank you so much @wangfenjin and @tustvold
I tried out the example locally (it compiled, though I can't quite figure out how to make a reasonable request via grpcurl
) and reviewed the code
I have a few suggestions for documentation improvements which I'll put up as a separate PR, but we can merge that as a follow on PR. ~I think this one is good to go 👍 ~ Update: after some more review and writing up the follow on tickets, I would like to consider a simpler interface
I will also file some follow on tickets and link them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I spent some more time with this code and tried to write up what type of client i would like to see (#1413) I was wondering if we could make this even higher level?
Specifically, to change the FlightSqlService
to hide the FlightInfo
and DoGetStream
if possible.
I also like the idea of putting this code behind a feature flag until the interface is stabilized more so we don't have to manage backwards compatibility.
async fn do_get_tables( | ||
&self, | ||
query: CommandGetTables, | ||
) -> Result<Response<<Self as FlightService>::DoGetStream>, Status>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be possible to change this to return something more ergonomic (though not support streaming):
async fn do_get_tables(
&self,
query: CommandGetTables,
) -> Result<Vec<String>, Status>;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the logic again, we can do that but I think not necessary. If we want it be easy to construct a FlightData, we may provide some utility methods like we did in here.
Some system already return FlightData or Arrow Schema/RecordBatch, if we force to return Vec of String, user might need to do the converting and then we convert it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- this is part of the tension I would like to explore in future PRs. Making APIs that are easier to use may be less efficient than using the low level APIs directly (aka having to create a new RecordBatch) - figuring out where to draw those lines will be important
Maybe we can introduce an even higher level Flight SQL abstraction. We'll see
async fn do_get_statement( | ||
&self, | ||
ticket: TicketStatementQuery, | ||
) -> Result<Response<<Self as FlightService>::DoGetStream>, Status>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like
async fn do_get_statement(
&self,
ticket: TicketStatementQuery,
) -> Result<SendableRecordBatchStream, Status>;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't SendableRecordBatchStream a DataFusion thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. But your comments helps, I checked other API again and changed some of them to return simpler Response, which also makes them identical to the cpp api design I missed previously. 😂
Thanks @alamb for your kind review, address some of your comments:
|
Change-Id: I740d3d4e5aabbb56219291381e6a6db6506eca28
Change-Id: I223cf76be10ff379fcc9000c730d99c9773c7c3d
Change-Id: I50915b85b2f806bac5cd3207623e3f4e0e1974a1
Change-Id: I562efcfa89a606b8061d2715ca1b6775e2a952a9
Change-Id: I80bef8c2b0a713a87c43487708ae721f5f8f9da9
Change-Id: Ie664a5fca965759dbba59ad9e34fc6e33150ddbf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again @wangfenjin
Upon further reflection (and bouncing some ideas around with @tustvold ), I think the core of my challenge is that it is not clear what the best way to assist people implementing FlightSQL is (we are not even sure we fully understand FlightSQL), but we need more data to do so
That being said, here is my proposal on how to move this PR forward:
- Rename the feature flag to
flight-sql-experimental
to hint it is not part of the public API - Merge this PR into arrow-rs
Then subsequently, we work on some implementation (in DataFusion or elsewhere) using the low level apis. As part of those implementations, we'll figure out FlightSQL works in detail and what API would make it easier to implement. Then we can contribute our learnings back to arrow-rs and make the FlightSqlService
service public.
What do you think?
Thank you
100% agree
Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we rename the feature flag I think this PR is ready to go and we can keep iterating afterwards. Thank you @wangfenjin @viirya and @tustvold for your efforts and comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @alamb's suggestion. Thanks @wangfenjin for working on this.
Co-authored-by: Andrew Lamb <[email protected]>
Change-Id: I4de4fe3768b0316e69ba6798406310632933d25d
Merging 🚀 |
I have implemented a demo flight sql client (which works against the java FlightSqlExample similiar to FlightSqlClientDemoAp Here ->> https://github.com/timvw/arrow-flightsql-odbc/blob/main/src/bin/client.rs cargo run --bin client localhost 52358 Execute "SELECT * FROM INTTABLE" cargo run --bin client localhost 52358 GetPrimaryKeys INTTABLE -- --schema APP +--------------+----------------+------------+-------------+--------------+--------------------+ |
Done... #1616 |
This is an attempt to support flight-sql in arrow-rs. Currently I only implement the server mod, but I'd like to send out the PR so I can get the code review to make sure I'm in the right direction.
example: wangfenjin/arrow-datafusion#1
TODOs:
Question:
flight-sql use protobuf Any a lot, but prost doesn't support that very well, specifically the UnpackTo/PackFrom method in cpp. I asked the question in fdeantoni/prost-wkt#14 but got no response yet. So currently I need to use protoc-rust to generate the pb and in the example I can do the marshal/unmarshal. Not sure if there is a better way? Or do we need to stick to prost?Update: switched to prost
Address #1323