-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Offloaded Raw Literals #5103
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5103 +/- ##
=======================================
Coverage 36.16% 36.17%
=======================================
Files 1302 1302
Lines 109484 109484
=======================================
+ Hits 39600 39609 +9
+ Misses 65746 65737 -9
Partials 4138 4138
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
rfc/system/5103-offloaded-literal.md
Outdated
## 3 Proposed Implementation | ||
|
||
### 3.1 Offloaded Literal IDL | ||
To the `Literal` [message](https://github.com/flyteorg/flyte/blob/cb6384ac6ea60f8b9421a71cfda4279f3579d3cb/flyteidl/protos/flyteidl/core/literals.proto#L95), add a new field called `starp` that will point to a location in the "metadata" bucket of the Flyte backend. The offloaded bytes should be deserialzable into a `Literal` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does starp
stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*p
i'm bad with names. this will definitely change.
rfc/system/5103-offloaded-literal.md
Outdated
Questions: How will things like metadata be handled? Should they be merged? What should be in the `value` field of the main parent Literal? | ||
|
||
### 3.2 Flyte Propeller | ||
* When writing map task outputs, depending on the size, Propeller will need to offload the LiteralCollection after constructing it, and create a new Literal for downstream tasks to use, with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* When writing map task outputs, depending on the size, Propeller will need to offload the LiteralCollection after constructing it, and create a new Literal for downstream tasks to use, with the | |
* When writing map task outputs, depending on the size, Propeller will need to offload the LiteralCollection after constructing it, and create a new Literal for downstream tasks to use. |
Incomplete?
rfc/system/5103-offloaded-literal.md
Outdated
|
||
### 3.2 Flyte Propeller | ||
* When writing map task outputs, depending on the size, Propeller will need to offload the LiteralCollection after constructing it, and create a new Literal for downstream tasks to use, with the | ||
* Also Propeller will need to check the flytekit version of the map task. If it's an older version (i.e. before the change proposed in this RFC), and it's large enough to need to be offloaded, it should fail the task. The assumption here is that if the map task is of the older version then downstream tasks will probably also be of those older versions which won't know how to resolved these offloaded literals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Also Propeller will need to check the flytekit version of the map task. If it's an older version (i.e. before the change proposed in this RFC), and it's large enough to need to be offloaded, it should fail the task. The assumption here is that if the map task is of the older version then downstream tasks will probably also be of those older versions which won't know how to resolved these offloaded literals. | |
* Also, Propeller will need to check the flytekit version of the map task. If it's an older version (i.e. before the change proposed in this RFC), and it's large enough to need to be offloaded, it should fail the task. The assumption here is that if the map task is of the older version then downstream tasks will probably also be of those older versions which won't know how to resolve these offloaded literals. |
rfc/system/5103-offloaded-literal.md
Outdated
For large outputs (like large maps of large dataclasses), Flytekit should also know how to offload the data. This should be done transparently to the user. How will propeller know to fail though if propeller hasn't been updated? | ||
|
||
### 3.4 Other Implications | ||
Does console need to change at all? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user clicking on the inputs
tab of a node, I'd expect to see the string representation of the Literal including the blob storage uri the literal has been offloaded to.
It would be nice to see the type of the literal as is for example already the case when e.g. a pytorch module or pickled object is offloaded (with the only difference that the offloading happens in the respective type transformer in flytekit and not in propeller of course).
This goes into the direction of this question posed above:
Questions: How will things like metadata be handled? Should they be merged? What should be in the
value
field of the main parent Literal?
* For map tasks, change the type of the output to a Union of the current user defined List and a new Offloaded type. We felt this would be a bit awkward since it changes the user-facing type itself (like if you were to pull up the map task definition in the API endpoint). It's also not extensible to other types of literals (maps of large dataclasses for example). | ||
|
||
* Build off of the input wrapper construct that's still in PR. The idea was to have the wrapper contain in large cases, a reference to the data, and in small cases, the data itself. We didn't fully like this idea because the entire input set or output set needs to be offloaded. | ||
* If the task downstream of a map task takes both the output list, along with some other input, after creating and upload the large pb file for the map task's output, Propeller would need to re-upload the entire large list or map (one time for each downstream task). If the offloading is done per literal, Propeller can just upload once and use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If the task downstream of a map task takes both the output list, along with some other input, after creating and upload the large pb file for the map task's output, Propeller would need to re-upload the entire large list or map (one time for each downstream task). If the offloading is done per literal, Propeller can just upload once and use. | |
* If the task downstream of a map task takes both the output list, along with some other input, after creating and uploading the large pb file for the map task's output, Propeller would need to re-upload the entire large list or map (one time for each downstream task). If the offloading is done per literal, Propeller can just upload once and use. |
|
||
## 8 Unresolved questions | ||
|
||
Should we create a new oneof that's offloaded? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate?
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
59192b7
to
0577fcf
Compare
Signed-off-by: Katrina Rogan <[email protected]>
09/26/2024 Contributors sync notes: @eapolinario et. al. to update this proposal with the changes from the Union internal discussions. |
Tracking issue
https://github.com/flyteorg/flyte/issues/
Why are the changes needed?
What changes were proposed in this pull request?
How was this patch tested?
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link