Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rust server and client for get_simple example #8

Merged
merged 8 commits into from
Mar 11, 2024

Conversation

mbrobbel
Copy link
Contributor

@mbrobbel mbrobbel commented Mar 7, 2024

A simple synchronous Rust (std-based) server and client for the get_simple example.

This skips over all http request/response parsing and error handling, but the examples can be updated later to use hyper (when there is async Arrow IPC support in arrow-rs).

❯ cargo r -p server --release
    Finished release [optimized] target(s) in 0.04s
     Running `target/release/server`
2024-03-07T20:14:47.266470Z  INFO data{TOTAL_RECORDS=100000000 RECORDS_PER_BATCH=4096}: server: Generating random data
2024-03-07T20:14:48.109722Z  INFO data{TOTAL_RECORDS=100000000 RECORDS_PER_BATCH=4096}: server: close time.busy=843ms time.idle=1.12µs
2024-03-07T20:14:48.109820Z  INFO server: Listening bind_addr=127.0.0.1:8000
2024-03-07T20:15:17.403848Z  INFO Writing Arrow IPC stream{remote_peer=127.0.0.1:50291}: server: Incoming connection
2024-03-07T20:15:18.006722Z  INFO Writing Arrow IPC stream{remote_peer=127.0.0.1:50291}: server: close time.busy=603ms time.idle=2.08µs
❯ cargo r -p client --release
    Finished release [optimized] target(s) in 0.03s
     Running `target/release/client`
2024-03-07T20:15:17.403862Z  INFO Reading Arrow IPC stream{addr=127.0.0.1:8000}: client: Connected
2024-03-07T20:15:18.007890Z  INFO Reading Arrow IPC stream{addr=127.0.0.1:8000}: client: batches=24415 rows=100000000
2024-03-07T20:15:18.147404Z  INFO Reading Arrow IPC stream{addr=127.0.0.1:8000}: client: close time.busy=744ms time.idle=1.21µs

@ianmcook
Copy link
Member

ianmcook commented Mar 7, 2024

Thanks @mbrobbel !

Does the Cargo.lock file need to be committed here?

@mbrobbel
Copy link
Contributor Author

mbrobbel commented Mar 7, 2024

Does the Cargo.lock file need to be committed here?

It's the recommended approach:

@ianmcook
Copy link
Member

ianmcook commented Mar 7, 2024

It's the recommended approach

After reading the guidance on this from the Cargo team, I think I'm -1 on including it in this case. The project here is intended to serve as an example for other projects to build on, not as a standalone project where the top priority is for the binary to build without errors. It seems undesirable in this case to have 900+ extra lines just to ensure the latter. The spirit of this area of the repo is to keep everything very minimal and not have any unnecessary assets even if they are black boxes.

@ianmcook
Copy link
Member

ianmcook commented Mar 8, 2024

I tested this Rust server with the clients implemented in other languages. That worked fine.

But I ran into problems when I tested this Rust client with the servers implemented in other languages.

@ianmcook
Copy link
Member

ianmcook commented Mar 8, 2024

The client works with the Python server example now, but I see errors when I try to use it with the Java and Go server examples.

With the Java server, I see this error:

thread 'main' panicked at client/src/main.rs:52:77:
called `Result::unwrap()` on an `Err` value: ParseError("Unable to get root as message: Unaligned { position: 3795177799, unaligned_type: \"i32\", error_trace: ErrorTrace([]) }")

With the Go server, I see this error:

thread 'main' panicked at library/alloc/src/raw_vec.rs:571:5:
capacity overflow

cc @zeroshade

@mbrobbel
Copy link
Contributor Author

mbrobbel commented Mar 8, 2024

The client works with the Python server example now, but I see errors when I try to use it with the Java and Go server examples.

With the Java server, I see this error:

thread 'main' panicked at client/src/main.rs:52:77:
called `Result::unwrap()` on an `Err` value: ParseError("Unable to get root as message: Unaligned { position: 3795177799, unaligned_type: \"i32\", error_trace: ErrorTrace([]) }")

With the Go server, I see this error:

thread 'main' panicked at library/alloc/src/raw_vec.rs:571:5:
capacity overflow

cc @zeroshade

The problem is caused by the BufReader in the client. In the case of the errors it buffered some bytes from the beginning of the body. I'll fix it.

@mbrobbel mbrobbel marked this pull request as ready for review March 8, 2024 18:51
@mbrobbel
Copy link
Contributor Author

mbrobbel commented Mar 8, 2024

Do we want CI?

@ianmcook
Copy link
Member

ianmcook commented Mar 8, 2024

Do we want CI?

Not yet. We will want to add CI later, once the examples like these become part of an official conventions document on the Arrow docs site. I think in general we want to keep this repo free of CI.

Copy link
Member

@ianmcook ianmcook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested these with all the other client and server examples and they work as expected. Thanks very much @mbrobbel!

@wjones127 or anyone else watching here: is there anything else you'd like to see here before we merge this?

@ianmcook
Copy link
Member

@mbrobbel I'm curious how feasible it would be to make the client example handle HTTP/1.1 responses with chunked transfer encoding? I think this would just involve handling lines with \r\n a bit differently.

Copy link
Member

@ianmcook ianmcook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbrobbel! I re-tested this new version that includes chunked encoding support against all the other server examples, toggling chunked encoding on and off on the servers, and it works as expected.

@ianmcook
Copy link
Member

I will merge later today if there are no other comments.

Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems okay, though I would definitely prefer it written in something higher-level like hyper. As-is, I'm not sure how many developers would use this logic.

Comment on lines +47 to +61
let mut chunked = false;
loop {
let mut line = String::default();
reader.read_line(&mut line).unwrap();
if let Some(("transfer-encoding", "chunked")) = line
.to_lowercase()
.split_once(':')
.map(|(key, value)| (key.trim(), value.trim()))
{
chunked = true;
}
if line == "\r\n" {
break;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit sad we are manually parsing headers and such. Makes the example a bit harder to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agreed. I started with hyper but moved to std because of apache/arrow-rs#1207.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Of all the examples here, this is the only one that directly implements low-level HTTP details instead of depending on a library to do that.

But the main objective of these get_simple examples is only to ensure that the IPC implementations in the mainstream Arrow libraries can function and interoperate over HTTP. This achieves that objective.

@ianmcook ianmcook merged commit bbbfaf6 into apache:main Mar 11, 2024
@mbrobbel mbrobbel deleted the rust branch March 11, 2024 19:24
@tustvold
Copy link

FYI with apache/arrow-rs#5531 it should be possible to switch this to use hyper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants