Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Replace Python gRPC with Rust #2328

Closed

Conversation

austin362667
Copy link
Collaborator

@austin362667 austin362667 commented Apr 4, 2024

Tracking issue

flyteorg/flyte#5344

Users continuously face weird grpcio bugs, such as version errors, or even being unable to install grpcio.

Why are the changes needed?

To remove the dependency on gRPC in Python and solely replace it with Rust.

What changes were proposed in this pull request?

For details, please check the design doc.

Architecture

FlyteKitRustClient

How was this patch tested?

Right now, only get_task() has been tested for the POC (proof of concept) purpose.
There is still a lot of work to be done afterwards.

Setup process

  • Use crate Tonic to compile proto files and generates client service code.
  • Use crate PyO3 to build Rust bindings for Python so we can offloading gRPC to compiled extension.

Screenshots

The performance of FlyteKit slightly benefits from Rust, which remains super amazing.

perf

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

#2307

Docs link

Signed-off-by: Austin Liu <[email protected]>

[wip]refactor: add rust grpc client

Signed-off-by: Austin Liu <[email protected]>

[wip]refactor: add rust grpc client

Signed-off-by: Austin Liu <[email protected]>

[wip]refactor: add rust grpc client

Signed-off-by: Austin Liu <[email protected]>

[wip]refactor: add rust grpc client

Signed-off-by: Austin Liu <[email protected]>

[wip]refactor: add rust grpc client

Signed-off-by: Austin Liu <[email protected]>

[wip]refactor: add rust grpc client

Signed-off-by: Austin Liu <[email protected]>

[wip]refactor: add rust grpc client

Signed-off-by: Austin Liu <[email protected]>
@austin362667
Copy link
Collaborator Author

austin362667 commented Apr 4, 2024

I'm a rookie both in Rust and gRPC, so please do not hesitate to give me any kind of advice.

@austin362667 austin362667 marked this pull request as ready for review April 4, 2024 16:52
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 4, 2024
@austin362667 austin362667 changed the title [WIP] Replace gRPC with Rust [WIP] Replace Python gRPC with Rust Apr 4, 2024
@austin362667 austin362667 marked this pull request as draft April 4, 2024 16:54
@thomasjpfan
Copy link
Member

Thank you for looking into this!

Here are my high level thoughts for implementing a FlyteRemote in Rust:

  • Do not use flytekit.models. flytekit.models was required when flyte was originally designed, but it is no longer required. The Rust FlyteRemote should return the IDL objects.
    • There are many strategies to deprecating the current flytekit.remote.FlyteRemote. For example, the rust is in flytekit.remote_v2.FlyteRemote and returns IDL objects.
  • How to access the IDL objects from Python
    • My preferred goal is to also remove the dependency on grpc related packages such as googleapis-common-protos and protobuf. (Many issues come from protobuf)
    • This PR's solution is to encode the IDL in rust and decode in Python. I do not like the solution because it requires flyteidl (which requires protobuf) to decode the IDL.
      • On the other hand, I see the external traits + external struct issue. The only workaround I see is to have our own rust macro that creates a custom struct, that wraps the external struct and adds in the pyo3 traits we need. (I am not sure how well this works)
  • We use the the GRPC Python API to create intercepters. It'll likely have to migrate these intercepters to rust as well.
  • There is a plan to move the flytekit repo into flyte, so it'll be easier to manage all the protobufs.

Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
@austin362667
Copy link
Collaborator Author

austin362667 commented Apr 5, 2024

Thank you so much, Thomas @thomasjpfan . I will discuss with Kevin @pingsutw in two hours.

access the IDL objects from Python

That makes sense to me. Sounds like my original approach. Back then, I didn't know that deprecating the current flytekit.remote.FlyteRemote is in the roadmap. So, if removing Python dependencies not only on gRPC but also on protobuf is the goal, we should eventually achieve that. It's not as aggressive as it may sound because, as you mentioned, we can task a flytekit.remote_v2.FlyteRemote strategy.

external traits + external struct issue

For sure, this is the tricky one. I will definitely look into a customized macro approach. Perhaps It can overcome the Rust coherence rule, or it can't.
Also, it's worth mentioning that PyO3 uses get_all and set_all attribute macros for Python ends to access its every member variable nestedly all the way down to primitive types for all high-level abstracted structs. And we need a new() method for each class so that we can construct from Python, also there's the __repr__ stuff to consider.

manage all the protobufs

So, I should modify this buf/generate stuff to better generate Rust service code for future usage. The current version has no client code like AdminServiceClient. It's a more unified way to manage proto files in one single place.
Alternatively, should I compile FlyteIDL proto files to start over in generating more complete Rust service code within the FlyteRS repository?

@@ -0,0 +1,15 @@
[package]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use buf to generate the flyteidl code? Something like flyteorg/flyte#5187

Copy link
Collaborator Author

@austin362667 austin362667 Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eapolinario , I think this can help! I've already tried but I'm unable to generate Rust code correctly on my local machine.
Since you can compile, there shouldn't be many problems in CI.

@kumare3
Copy link
Contributor

kumare3 commented Apr 5, 2024

this is awesome but @vlad-ivanov-name seems to have a good grasp. Can you please help review and help us drive a best choices in designing.

@austin362667 also I have a PR #2307

@thomasjpfan
Copy link
Member

The current version has no client code like AdminServiceClient. It's a more unified way to manage proto files in one single place.
Alternatively, should I compile FlyteIDL proto files to start over in generating more complete Rust service code within the FlyteRS repository?

If there is a way to get buf to only generate service code, then I like that solution. Otherwise, I am okay with the current solution of using https://github.com/tokio-rs/prost.

For sure, this is the tricky one. I will definitely look into a customized macro approach. Perhaps It can overcome the Rust coherence rule, or it can't.

To answer this question, I would use this example protobuf and try different techniques for exposing it into Python. I'll start with making sure the Person works by itself and then work in the other objects:

message Person {
  string name = 1;
  int32 id = 2;  // Unique ID number for this person.
  string email = 3;
}

I think this Rust<->Python data exchange is the unknown technical part of this work. I know removing our dependency on the Python protobuf library makes this task harder, but protobuf is what causes the most issues. For me, removing the dependency has more impact on the user experience than the speed gains.

@thomasjpfan
Copy link
Member

I think figuring out the Rust<->Python data exchange and writing a Rust's FlyteClient can be two separate task:

  1. The Rust FlyteClient is a pure rust library that implements and API required for FlyteRemote. This pure rust library does not depend on pyo3 and does not worry about the Python binding. I find that having a pure rust library that does not depend on pyo3 makes it easier to test in rust.
  2. A second rust library that depends on pyo3 and binds the Rust FlyteClient with Python. All this library cares about is binding.

I feel like 2. is the unknown part right now, so I am pushing to figuring that out first.

vlad-ivanov-name

This comment was marked as duplicate.

@vlad-ivanov-name
Copy link

I think something's up with github as it first duplicated my review, and after I hid one duplicate the second disappeared 🙈 sorry about that, you can expand the comment above

@austin362667
Copy link
Collaborator Author

austin362667 commented Apr 5, 2024

get buf to only generate service code

I think @eapolinario resolved this issue with PR flyteorg/flyte#5187.

removing the dependency has more impact on the user experience than the speed gains

I totally agree with this objective.

makes this task harder, but protobuf is what causes the most issues

Kafka PMC @chia7712 is interested in the purpose of these refactorings. In his experience, projects with multiple languages often struggle, making consistency maintenance difficult. For example, Integrating Java with C/C++ can cause memory leaks, and Java with Scala may lead to binary compatibility issues. This complexity extends to build logic, requiring various tools and additional manpower for maintenance.

Rust<->Python data exchange

From what I know, this Person class will function seamlessly whether using a strategy of serializing into bytes or through PyO3-supported type conversions of primitive types when exchanging data.
By adding #[pyclass(module="flyrs", get_all)] to every struct and enum via a sed command or something like that, you can expose identical classes and all its members to Python from Rust.

With respect to the Rust coherence rule, be careful when using external crates like prost-types for well-known types such as prost_types::Duration. Inside our crate, we cannot easily impl the external trait from crate pyo3 for external class from another crate prost-types.
(You can learn more details from rustc --explain E0117 and rustc --explain E0210.)

For my understanding, we now have two very different solutions:

  1. Maintain two identical IDL classes under the hood that are separate for languages, Rust and Python.
  2. Communication through bytes.

In the earlier sync with Kevin @pingsutw, he raised a concern: Can we really get rid of all IDL objects in Python? If we remove the dependency on protobuf, what would be the size of impact?

@austin362667
Copy link
Collaborator Author

austin362667 commented Apr 5, 2024

Thank you! @vlad-ivanov-name , these suggestions are super helpful. We also need some advice to help us make the best choices in designing that @thomasjpfan and I just discussed.
FYR design doc here

@thomasjpfan
Copy link
Member

thomasjpfan commented Apr 5, 2024

With respect to the Rust coherence rule, be careful when using external crates like prost-types for well-known types such as prost_types::Duration.

Thank you for the info! That is good to know.

  1. Maintain two identical IDL classes under the hood that are separate for languages, Rust and Python.
  2. Communication through bytes.

Yup, I agree that are the two paths.

Can we really get rid of all IDL objects in Python? If we remove the dependency on protobuf, what would be the size of impact?

The overall size is a minor benefit. For me, getting rid of the protobuf dependency has two major benefits:

  1. Less dependencies which makes flytekit easier to install and less likely to conflict with the user's libraries. Any conflicts with a user's domain libraries with flytekit will result a worst experience.
  2. If another library uses an older version of protobuf, they will get the following error. This adds leads to a worst user experience from using flytekit.
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your `protos`, some other possible workarounds are:
 1. Downgrade the `protobuf` package to 3.20.x or lower.
 2. Set `PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python` (but this will use pure-Python parsing and will be much slower).

On a technical level going through Bytes and using the Python IDL + protobuf works. Without removing the protobuf dependency, I do not see a major win from the developer experience side. Concretely, here is how I see the pros and cons of a Rust FlyteRemote:

Pros

  1. Slightly faster with Rust FlyteRemote
  2. Maybe more memory efficient with Rust FlyteRemote

Cons

  1. We need to maintain a rust library and all the complexities with it. For example, we'll have to package and compile binary wheels in flytekit.
  2. If we want to use Python async in the future, then the async story is not great with PyO3.

I can get behind a Rust FlyteRemote if we can point to user facing issues around using Python's grpc library. Personally, I have not come across issues with using grpc from Python.

@chia7712
Copy link

chia7712 commented Apr 5, 2024

Thanks for @austin362667 to share the issue with me (and ping me here)

I do enjoy learning the tech design so I talked with Austin for the issue. However, my comments may be produced without enough context and details. And so please feel free to correct my concerns as that is good for my education :)

@austin362667
Copy link
Collaborator Author

austin362667 commented Apr 5, 2024

size is a minor benefit

Sorry for the confusion, my fault for misleading you. By 'size of impact', I mean how much will be affect by removing Python IDL dependencies.

But yeah, I get your points. Thanks for the thoughtful elaborations.

two major benefits
pros and cons

Mind letting me add the above analysis to the design doc? I've invited you as an editor, too.

@austin362667
Copy link
Collaborator Author

@chia7712 It's really nice of you sharing your adept insights with us here.

@thomasjpfan
Copy link
Member

Mind letting me add the above analysis to the design doc? I've invited you as an editor, too.

Yup, you can add anything of my comments into your design doc.

@kumare3
Copy link
Contributor

kumare3 commented Apr 5, 2024

@thomasjpfan / @austin362667 / @eapolinario should we just make this a common FlyteRustClient? like in flyteidl and then create the binding layer only in flytekit? This way we will be pure rust in flyteidl that can be imported as a crate? I agree with @thomasjpfan I would love the entire grpc, protobuf dependencies to be rust only. and then we figure out how to make it work with python separately. This way the rust core client can be used in multiple places.

@vlad-ivanov-name
Copy link

@austin362667 I can't say I'm very familiar with flyte internals - only with some basics - but if you'd like to discuss any of more general rust-related technical questions I would be open to have a call

@thomasjpfan
Copy link
Member

like in flyteidl and then create the binding layer only in flytekit? This way we will be pure rust in flyteidl that can be imported as a crate?

I agree with having a rust crate with just a RustFlyteRemote and then having a seperate Python library do the binding. I like having a lightweight flytekit-core Python library that exposes the RustFlyteRemote through the Python IDL + protobuf (no flytekit.models). The key benefits are:

  • flytekit depends on flytekit-core for interacting with the remote. flytekit itself does not need to worry about packaging rust.
  • I've seen demand for a lightweight FlyteRemote. A flytekit-core with a RustFlyteRemote will meet that need.

XREF: Having a "rust core" for a Python library is a fairly common pattern: https://github.com/pydantic/pydantic-core

Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>

cleanup

Signed-off-by: Austin Liu <[email protected]>

cleanup

Signed-off-by: Austin Liu <[email protected]>

cleanup

Signed-off-by: Austin Liu <[email protected]>
@austin362667 austin362667 force-pushed the austin362667/flyrs/grpc branch from 63cfba0 to b132c89 Compare April 13, 2024 08:02
Signed-off-by: Austin Liu <[email protected]>
@austin362667
Copy link
Collaborator Author

Currently we have diverged code for existing python and rust from raw.py, friendly.py to remote.py. It will be hard to sync latest changes with Python side from Rust side.

Should we add redundant files from flytekit.remote here? Like, backfill.py, data.py, entities.py, executions.py, interface.py, lazy_entity.py, remote_callable.py, remote_fs.py?

Most of them require grpcio, but they are also used by flyrs/remote/remote.py, so I duplicated them here inside flyrs/remote/ for now.

The only way to check that if we get rid of grpcio or not is by ensuring that we can still use FlyteRemoteClient without installing the grpcio package. cc @pingsutw

Or somehow can we achieve this via lazy installation in Python?

Signed-off-by: Austin Liu <[email protected]>
@austin362667 austin362667 force-pushed the austin362667/flyrs/grpc branch from 23d2c71 to d704fcb Compare April 17, 2024 06:18
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>

refine testing

Signed-off-by: Austin Liu <[email protected]>
@austin362667 austin362667 force-pushed the austin362667/flyrs/grpc branch from a2a0cbd to 9f20d3f Compare April 19, 2024 09:30
Signed-off-by: Austin Liu <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants