Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement basic FlightSQL Server #1386

Merged
merged 19 commits into from
Mar 11, 2022
Merged

Implement basic FlightSQL Server #1386

merged 19 commits into from
Mar 11, 2022

Conversation

wangfenjin
Copy link
Contributor

@wangfenjin wangfenjin commented Mar 3, 2022

This is an attempt to support flight-sql in arrow-rs. Currently I only implement the server mod, but I'd like to send out the PR so I can get the code review to make sure I'm in the right direction.

example: wangfenjin/arrow-datafusion#1

TODOs:

  • Impl client
  • Make flight-sql as an optional feature?

Question:
flight-sql use protobuf Any a lot, but prost doesn't support that very well, specifically the UnpackTo/PackFrom method in cpp. I asked the question in fdeantoni/prost-wkt#14 but got no response yet. So currently I need to use protoc-rust to generate the pb and in the example I can do the marshal/unmarshal. Not sure if there is a better way? Or do we need to stick to prost?

Update: switched to prost

Address #1323

Change-Id: I108b2468b078470bb8b6f95c031035cc09227986
@github-actions github-actions bot added the arrow-flight Changes to the arrow-flight crate label Mar 3, 2022
@alamb
Copy link
Contributor

alamb commented Mar 3, 2022

Thank you @wangfenjin -- I will try and give this a review over the next few days

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wangfenjin --- I think this looks like a great start. 🏅

The only thing I think is really needed prior to initial merge is:

  1. An integration test demonstrating the end to end flow of sending / receiving messages
  2. Examples and Documentation (perhaps the tests would suffice initially?)

Follow on items (I can file follow on tickets for these)

  1. Create a flight sql client to simplify interacting with a flight sql server (something like this would likely be needed for the tests); This could follow the same model of wrapping the underlying tonic client.
  2. Fill out more of the implementations (I think it is ok to merge the initial APIs and then implement them in subsequent PRs -- maybe others in the community would be interested in helping too)

arrow-flight/build.rs Show resolved Hide resolved
arrow-flight/src/sql/server.rs Show resolved Hide resolved
arrow-flight/src/sql/server.rs Show resolved Hide resolved
arrow-flight/src/sql/server.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/server.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/server.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/server.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/server.rs Show resolved Hide resolved
@alamb alamb changed the title init impl flight sql Implement basic FlightSQL Server Mar 6, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to have time for a more in depth review later in the week, but don't feel you need to wait on this.

I do feel that bringing in a second protobuf library for decoding the any payloads is something that could cause immense confusion down the line. I think if there isn't a particular reason not to use prost, it would be better to be consistent. We use prost in IOx for any payloads, if that could help as a reference...

Otherwise, nice work, glad to see this coming together 👍

arrow-flight/Cargo.toml Outdated Show resolved Hide resolved
arrow-flight/build.rs Show resolved Hide resolved
arrow-flight/src/sql/server.rs Outdated Show resolved Hide resolved
@wangfenjin
Copy link
Contributor Author

Thank for @alamb and @tustvold for your review, I'll check all the comments and update the code later, response to some of your general comment:

  1. About the protobuf dependency, I'll check IOx 's codebase to see if I can remove the rust_protobuf
  2. As you suggested, I'll focus on server side in this PR, and leave client to the follow up. According to the cpp implementation, the client library logic might be more complex compared to server.
  3. As you may have noted, I implement an example in init impl flight sql server example wangfenjin/arrow-datafusion#1 (we need to publish a new version then I can create MR to the arrow-datafusion repo). For the integration test, not sure if we need to implement it in this repo? Because if we implement one like this seems useless, and if we want to implement a useful example, we'd better depends on arrow-datafusion as we need a SQL server.

@GavinRay97
Copy link

This is awesome! Thank you =)

@alamb
Copy link
Contributor

alamb commented Mar 7, 2022

[As you may have noted, I implement an example in https://github.com/wangfenjin/arrow-datafusion/pull/1 (we need to publish a new version then I can create MR to the arrow-datafusion repo). For the integration test, not sure if we need to implement it in this repo? Because if we implement one like this seems useless, and if we want to implement a useful example, we'd better depends on arrow-datafusion as we need a SQL server.](#1386 (comment))

I actually think both types of examples are useful, but for different purposes:

  1. Example just in arrow-flight (no actual SQL implementation) such as this: helps users of arrow-flight who will not be using datafusion something to start with that compiles so they can plug in their own implementation that without having to cut out datafusion specific stuff

  2. Example in datafusion such as this: Shows a real end to end use of flight sql and how one system connects it together

The example in arrow can be done as a follow on PR (maybe someone else will do it) -- I'll plan to file tickets for follow on work after your initial PR

Thanks again @wangfenjin

Change-Id: Ibb381e105041b38e6402850a2338403f802568ec
Copy link
Contributor Author

@wangfenjin wangfenjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alamb @tustvold , should have fix most of the comment issue. Please kindly help review again, thanks!

arrow-flight/src/sql/server.rs Outdated Show resolved Hide resolved
Change-Id: I9485e510f1a960b6e094e559c3679434f8474ec1
Change-Id: I7ef4ade3acc81ccf5df088c866d41b538cf6f4f2
@codecov-commenter
Copy link

codecov-commenter commented Mar 7, 2022

Codecov Report

Merging #1386 (8387bcf) into master (4bcc7a6) will decrease coverage by 0.50%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1386      +/-   ##
==========================================
- Coverage   83.17%   82.67%   -0.51%     
==========================================
  Files         182      185       +3     
  Lines       53439    53764     +325     
==========================================
  Hits        44449    44449              
- Misses       8990     9315     +325     
Impacted Files Coverage Δ
arrow-flight/examples/flight_sql_server.rs 0.00% <0.00%> (ø)
arrow-flight/src/lib.rs 18.54% <ø> (ø)
arrow-flight/src/sql/mod.rs 0.00% <0.00%> (ø)
arrow-flight/src/sql/server.rs 0.00% <0.00%> (ø)
arrow/src/array/transform/mod.rs 86.31% <0.00%> (-0.12%) ⬇️
parquet_derive/src/parquet_field.rs 66.21% <0.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4bcc7a6...8387bcf. Read the comment docs.

Change-Id: I35d108ef43f2c2245444cfd5ea82da00b4f694f9
#[prost(int64, tag = "1")]
pub record_count: i64,
}
/// Options for CommandGetSqlInfo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is awesome that all these comments (from the protobuf) got kept

arrow-flight/src/sql/server.rs Show resolved Hide resolved
@alamb
Copy link
Contributor

alamb commented Mar 7, 2022

Thanks @wangfenjin -- I will try and review this carefully tomorrow

Change-Id: Ic159cea2c76b017e183d2946e2d24e6fd1f9b4c1
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, a couple of nits along with what I think is causing the very bizzarre compiler errors. Nice work 🥇

fn as_any(&self) -> prost_types::Any;
}

macro_rules! prost_message_ext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👌

arrow-flight/src/sql/server.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/server.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/server.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/server.rs Show resolved Hide resolved
arrow-flight/src/sql/mod.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/server.rs Outdated Show resolved Hide resolved
M::type_url() == self.type_url
}

fn unpack<M: ProstMessageExt>(&self) -> ArrowResult<Option<M::Item>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could return tonic::Status given the fact it will be predominantely used in gRPC handlers? 🤔

Copy link
Contributor Author

@wangfenjin wangfenjin Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most public API in this lib will return ArrowResult, I think it's better to keep it consistent

pub mod server;

/// ProstMessageExt are useful utility methods for prost::Message types
pub trait ProstMessageExt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should at least be pub(crate), it feels like something best kept as an implementation detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


/// ProstAnyExt are useful utility methods for prost_types::Any
/// The API design is inspired by [rust-protobuf](https://github.com/stepancheg/rust-protobuf/blob/master/protobuf/src/well_known_types_util/any.rs)
pub trait ProstAnyExt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should possibly also be crate local

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as ProstMessageExt

Change-Id: I709c16613092fd42ccff827eed3e3ad3f28368e2
Change-Id: I03ed0f69ddb1203ecd75982815fa72eca4d81160
Change-Id: Ia35d697aaac3c72feba9c3aaf380ee3930484c48
Change-Id: I6006702d424ac6595f58c66057df267c4fd24476
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is looking great. Thank you so much @wangfenjin and @tustvold

I tried out the example locally (it compiled, though I can't quite figure out how to make a reasonable request via grpcurl) and reviewed the code

I have a few suggestions for documentation improvements which I'll put up as a separate PR, but we can merge that as a follow on PR. ~I think this one is good to go 👍 ~ Update: after some more review and writing up the follow on tickets, I would like to consider a simpler interface

I will also file some follow on tickets and link them here.

@alamb
Copy link
Contributor

alamb commented Mar 9, 2022

cc @nevi-me @e-dard and @seddonm1 given your comments on #1323

alamb
alamb previously approved these changes Mar 9, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I spent some more time with this code and tried to write up what type of client i would like to see (#1413) I was wondering if we could make this even higher level?

Specifically, to change the FlightSqlService to hide the FlightInfo and DoGetStream if possible.

I also like the idea of putting this code behind a feature flag until the interface is stabilized more so we don't have to manage backwards compatibility.

async fn do_get_tables(
&self,
query: CommandGetTables,
) -> Result<Response<<Self as FlightService>::DoGetStream>, Status>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be possible to change this to return something more ergonomic (though not support streaming):

  async fn do_get_tables(
        &self,
        query: CommandGetTables,
    ) -> Result<Vec<String>, Status>;

Copy link
Contributor Author

@wangfenjin wangfenjin Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the logic again, we can do that but I think not necessary. If we want it be easy to construct a FlightData, we may provide some utility methods like we did in here.

Some system already return FlightData or Arrow Schema/RecordBatch, if we force to return Vec of String, user might need to do the converting and then we convert it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- this is part of the tension I would like to explore in future PRs. Making APIs that are easier to use may be less efficient than using the low level APIs directly (aka having to create a new RecordBatch) - figuring out where to draw those lines will be important

Maybe we can introduce an even higher level Flight SQL abstraction. We'll see

async fn do_get_statement(
&self,
ticket: TicketStatementQuery,
) -> Result<Response<<Self as FlightService>::DoGetStream>, Status>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like

async fn do_get_statement(
        &self,
        ticket: TicketStatementQuery,
    ) -> Result<SendableRecordBatchStream, Status>;

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't SendableRecordBatchStream a DataFusion thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. But your comments helps, I checked other API again and changed some of them to return simpler Response, which also makes them identical to the cpp api design I missed previously. 😂

@alamb alamb self-requested a review March 9, 2022 21:24
@alamb alamb dismissed their stale review March 9, 2022 21:55

I think it could be simplified a bit more

@wangfenjin
Copy link
Contributor Author

wangfenjin commented Mar 10, 2022

Thanks @alamb for your kind review, address some of your comments:

  1. I'll add a flight-sql feature flag for this
  2. Agree that maybe we can make the API more ergonomic, but I need to do more experiments on this (I'm trying to build a more practical flight-sql-server using this), then we will clear what we need. My suggestion is we can design the API as it is, and after we have better/ more simplified design, we can add them into the trait, and make the low level API as a default implementation in the trait, so the user still have chance to override them if they want. It's very important we leave this flexibility to the user.
  3. grpcurl may not work for our testing. As I comment in init impl flight sql server example wangfenjin/arrow-datafusion#1 this PR, I use the arrow-cpp-cli to connect to this server, it helps when we don't have the client implementation, also it makes sure our implementation is compatible with the cpp. We may also need to think about maintain this compatibility in long term.
  4. For the documentation thing, I copy some useful comments from cpp implementation. For more detailed documentation like the protocols, I think it's a joint effort with the cpp community, in this repo we can focus on the rust API documentation.

Change-Id: I740d3d4e5aabbb56219291381e6a6db6506eca28
Change-Id: I223cf76be10ff379fcc9000c730d99c9773c7c3d
Change-Id: I50915b85b2f806bac5cd3207623e3f4e0e1974a1
Change-Id: I562efcfa89a606b8061d2715ca1b6775e2a952a9
Change-Id: I80bef8c2b0a713a87c43487708ae721f5f8f9da9
Change-Id: Ie664a5fca965759dbba59ad9e34fc6e33150ddbf
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again @wangfenjin

Upon further reflection (and bouncing some ideas around with @tustvold ), I think the core of my challenge is that it is not clear what the best way to assist people implementing FlightSQL is (we are not even sure we fully understand FlightSQL), but we need more data to do so

That being said, here is my proposal on how to move this PR forward:

  1. Rename the feature flag to flight-sql-experimental to hint it is not part of the public API
  2. Merge this PR into arrow-rs

Then subsequently, we work on some implementation (in DataFusion or elsewhere) using the low level apis. As part of those implementations, we'll figure out FlightSQL works in detail and what API would make it easier to implement. Then we can contribute our learnings back to arrow-rs and make the FlightSqlService service public.

What do you think?

@alamb
Copy link
Contributor

alamb commented Mar 10, 2022

I'll add a flight-sql feature flag for this

Thank you

Agree that maybe we can make the API more ergonomic, but I need to do more experiments on this (I'm trying to build a more practical flight-sql-server using this), then we will clear what we need. My suggestion is we can design the API as it is, and after we have better/ more simplified design, we can add them into the trait, and make the low level API as a default implementation in the trait, so the user still have chance to override them if they want. It's very important we leave this flexibility to the user.

100% agree

grpcurl may not work for our testing. As I comment in wangfenjin/arrow-datafusion#1 this PR, I use the arrow-cpp-cli to connect to this server, it helps when we don't have the client implementation, also it makes sure our implementation is compatible with the cpp. We may also need to think about maintain this compatibility in long term.

For the documentation thing, I copy some useful comments from cpp implementation. For more detailed documentation like the protocols, I think it's a joint effort with the cpp community, in this repo we can focus on the rust API documentation.

Thanks

arrow-flight/Cargo.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rename the feature flag I think this PR is ready to go and we can keep iterating afterwards. Thank you @wangfenjin @viirya and @tustvold for your efforts and comments

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @alamb's suggestion. Thanks @wangfenjin for working on this.

wangfenjin and others added 2 commits March 11, 2022 08:36
Co-authored-by: Andrew Lamb <[email protected]>
Change-Id: I4de4fe3768b0316e69ba6798406310632933d25d
@alamb
Copy link
Contributor

alamb commented Mar 11, 2022

Merging 🚀

@alamb alamb merged commit abc2d67 into apache:master Mar 11, 2022
@wangfenjin wangfenjin deleted the flight-sql branch March 11, 2022 11:27
@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Mar 17, 2022
@timvw
Copy link

timvw commented Apr 22, 2022

I have implemented a demo flight sql client (which works against the java FlightSqlExample similiar to FlightSqlClientDemoAp

Here ->> https://github.com/timvw/arrow-flightsql-odbc/blob/main/src/bin/client.rs

cargo run --bin client localhost 52358 Execute "SELECT * FROM INTTABLE"
+----+--------------+-------+-----------+
| ID | KEYNAME | VALUE | FOREIGNID |
+----+--------------+-------+-----------+
| 1 | one | 1 | 1 |
| 2 | zero | 0 | 1 |
| 3 | negative one | -1 | 1 |
+----+--------------+-------+-----------+
+----+---------+-------+-----------+
| ID | KEYNAME | VALUE | FOREIGNID |
+----+---------+-------+-----------+
+----+---------+-------+-----------+

cargo run --bin client localhost 52358 GetPrimaryKeys INTTABLE -- --schema APP

+--------------+----------------+------------+-------------+--------------+--------------------+
| catalog_name | db_schema_name | table_name | column_name | key_sequence | key_name |
+--------------+----------------+------------+-------------+--------------+--------------------+
| | APP | INTTABLE | ID | 1 | SQL220422165132740 |
+--------------+----------------+------------+-------------+--------------+--------------------+

@wangfenjin
Copy link
Contributor Author

@timvw you could create an PR to #1413

I’m working on it but still got no time to finish it.

@timvw
Copy link

timvw commented Apr 24, 2022

Done... #1616

@alamb alamb mentioned this pull request Dec 8, 2022
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow-flight Changes to the arrow-flight crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants