Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Offloaded Raw Literals #5103

Merged
merged 3 commits into from
Sep 24, 2024
Merged

[RFC] Offloaded Raw Literals #5103

merged 3 commits into from
Sep 24, 2024

Conversation

wild-endeavor
Copy link
Contributor

Tracking issue

https://github.com/flyteorg/flyte/issues/

Why are the changes needed?

What changes were proposed in this pull request?

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

codecov bot commented Mar 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 36.17%. Comparing base (4f9227a) to head (2abc127).
Report is 208 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5103   +/-   ##
=======================================
  Coverage   36.16%   36.17%           
=======================================
  Files        1302     1302           
  Lines      109484   109484           
=======================================
+ Hits        39600    39609    +9     
+ Misses      65746    65737    -9     
  Partials     4138     4138           
Flag Coverage Δ
unittests-datacatalog 51.37% <ø> (ø)
unittests-flyteadmin 55.33% <ø> (+0.03%) ⬆️
unittests-flytecopilot 12.17% <ø> (ø)
unittests-flytectl 62.32% <ø> (+0.04%) ⬆️
unittests-flyteidl 7.08% <ø> (ø)
unittests-flyteplugins 53.31% <ø> (ø)
unittests-flytepropeller 41.74% <ø> (ø)
unittests-flytestdlib 55.35% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eapolinario eapolinario changed the title rfc [RFC] Offloaded Raw Literals Mar 28, 2024
## 3 Proposed Implementation

### 3.1 Offloaded Literal IDL
To the `Literal` [message](https://github.com/flyteorg/flyte/blob/cb6384ac6ea60f8b9421a71cfda4279f3579d3cb/flyteidl/protos/flyteidl/core/literals.proto#L95), add a new field called `starp` that will point to a location in the "metadata" bucket of the Flyte backend. The offloaded bytes should be deserialzable into a `Literal` object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does starp stand for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*p i'm bad with names. this will definitely change.

Questions: How will things like metadata be handled? Should they be merged? What should be in the `value` field of the main parent Literal?

### 3.2 Flyte Propeller
* When writing map task outputs, depending on the size, Propeller will need to offload the LiteralCollection after constructing it, and create a new Literal for downstream tasks to use, with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* When writing map task outputs, depending on the size, Propeller will need to offload the LiteralCollection after constructing it, and create a new Literal for downstream tasks to use, with the
* When writing map task outputs, depending on the size, Propeller will need to offload the LiteralCollection after constructing it, and create a new Literal for downstream tasks to use.

Incomplete?


### 3.2 Flyte Propeller
* When writing map task outputs, depending on the size, Propeller will need to offload the LiteralCollection after constructing it, and create a new Literal for downstream tasks to use, with the
* Also Propeller will need to check the flytekit version of the map task. If it's an older version (i.e. before the change proposed in this RFC), and it's large enough to need to be offloaded, it should fail the task. The assumption here is that if the map task is of the older version then downstream tasks will probably also be of those older versions which won't know how to resolved these offloaded literals.
Copy link
Member

@fg91 fg91 Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Also Propeller will need to check the flytekit version of the map task. If it's an older version (i.e. before the change proposed in this RFC), and it's large enough to need to be offloaded, it should fail the task. The assumption here is that if the map task is of the older version then downstream tasks will probably also be of those older versions which won't know how to resolved these offloaded literals.
* Also, Propeller will need to check the flytekit version of the map task. If it's an older version (i.e. before the change proposed in this RFC), and it's large enough to need to be offloaded, it should fail the task. The assumption here is that if the map task is of the older version then downstream tasks will probably also be of those older versions which won't know how to resolve these offloaded literals.

For large outputs (like large maps of large dataclasses), Flytekit should also know how to offload the data. This should be done transparently to the user. How will propeller know to fail though if propeller hasn't been updated?

### 3.4 Other Implications
Does console need to change at all?
Copy link
Member

@fg91 fg91 Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user clicking on the inputs tab of a node, I'd expect to see the string representation of the Literal including the blob storage uri the literal has been offloaded to.

It would be nice to see the type of the literal as is for example already the case when e.g. a pytorch module or pickled object is offloaded (with the only difference that the offloading happens in the respective type transformer in flytekit and not in propeller of course).

This goes into the direction of this question posed above:

Questions: How will things like metadata be handled? Should they be merged? What should be in the value field of the main parent Literal?

* For map tasks, change the type of the output to a Union of the current user defined List and a new Offloaded type. We felt this would be a bit awkward since it changes the user-facing type itself (like if you were to pull up the map task definition in the API endpoint). It's also not extensible to other types of literals (maps of large dataclasses for example).

* Build off of the input wrapper construct that's still in PR. The idea was to have the wrapper contain in large cases, a reference to the data, and in small cases, the data itself. We didn't fully like this idea because the entire input set or output set needs to be offloaded.
* If the task downstream of a map task takes both the output list, along with some other input, after creating and upload the large pb file for the map task's output, Propeller would need to re-upload the entire large list or map (one time for each downstream task). If the offloading is done per literal, Propeller can just upload once and use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If the task downstream of a map task takes both the output list, along with some other input, after creating and upload the large pb file for the map task's output, Propeller would need to re-upload the entire large list or map (one time for each downstream task). If the offloading is done per literal, Propeller can just upload once and use.
* If the task downstream of a map task takes both the output list, along with some other input, after creating and uploading the large pb file for the map task's output, Propeller would need to re-upload the entire large list or map (one time for each downstream task). If the offloading is done per literal, Propeller can just upload once and use.


## 8 Unresolved questions

Should we create a new oneof that's offloaded?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate?

@katrogan katrogan mentioned this pull request Aug 12, 2024
3 tasks
@katrogan katrogan marked this pull request as ready for review August 13, 2024 16:30
@katrogan katrogan mentioned this pull request Aug 14, 2024
3 tasks
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
@katrogan katrogan force-pushed the rfc/offloaded-literal branch from 59192b7 to 0577fcf Compare August 14, 2024 15:03
Signed-off-by: Katrina Rogan <[email protected]>
@katrogan katrogan merged commit c0fc6d4 into master Sep 24, 2024
50 checks passed
@katrogan katrogan deleted the rfc/offloaded-literal branch September 24, 2024 00:19
@davidmirror-ops
Copy link
Contributor

09/26/2024 Contributors sync notes: @eapolinario et. al. to update this proposal with the changes from the Union internal discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Implementation in progress
Development

Successfully merging this pull request may close these issues.

5 participants