Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Allow passing additional metadata in inputs/outputs of tasks/workflows #4654

Open
5 of 6 tasks
EngHabu opened this issue Dec 29, 2023 · 1 comment
Open
5 of 6 tasks
Labels
enhancement New feature or request

Comments

@EngHabu
Copy link
Contributor

EngHabu commented Dec 29, 2023

Motivation: Why do you think this is important?

There are a few instances we faced in the past where we wanted to pass additional contextual information about the inputs/outputs of a task that are not themselves inputs/outputs (The most recent example is flyte decks). Unfortunately the way we had to pass that is by convention. The engine (i.e. Propeller) assumes a task writes decks, checks for the file existence and bails if it's not there. Conventions are not very strong :-) and limit the optimizations the system can make.
Another instance of this is executing, generating and exporting input validation report from Great Expectations or Pandera. This is not really an output but rather metadata about the input/output dataframes.

Goal: What should the final outcome look like, ideally?

This is a complex change and its complexity stems from a few aspects:

  1. The system assumes Inputs and outputs are always LiteralMaps... which is an assumption we are breaking
  2. Backward compatibility, at any point in time users expect to use an old flytekit with a new backend and maybe an even older frontend and some 4th version of flyte propeller. Making the system backward and forward compatible is crucial for a smooth rollout.

Here is a proposed breakdown of the work particularly around the component boundaries:

  • Make the change in flyteidl and all backend code. This code should default to reading the new InputData and OutputData and fallback to LiteralMap if unmarshaling fails, this will ensure old propeller can continue to work with a newer admin (but not vis versa)
    • When flytepropeller writes inputs.pb for a task, it should gate that behavior based on the version of flytekit. If it's not a flytekit runtime or it's an old version, it should default to writing the old format until other SDKs are updated.
  • Make the change in flytekit to do the same.
  • Make the change in flyteconsole to do the same.

The rollout plan is:

  1. Update flyteadmin to latest
  2. Update flyteconsole, copilot, propeller, flytekit to latest in any order

As long as flyteadmin is updated first, the system assumptions will hold.

Describe alternatives you've considered

Continue to use conventions to pass data... not good

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@EngHabu EngHabu added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Dec 29, 2023
Copy link

dosubot bot commented Dec 29, 2023

🤖

Hello @EngHabu!

I'm Dosu, a friendly bot here to assist you while we wait for a human maintainer to chime in. I'm here to help you solve bugs, answer questions, and navigate your way into becoming a contributor. Just remember, I'm a bot, but I'm here to make your interaction with this repository as smooth as possible.

Let's get cracking on this!

Thank you for your detailed request. I agree that passing additional metadata in inputs/outputs of tasks/workflows can be a valuable feature for Flyte.

As you've mentioned, this is a complex change due to the system's assumption that Inputs and Outputs are always LiteralMaps and the need for backward compatibility.

To implement this feature, we can extend the existing protobuf definitions to include additional fields for metadata. For example, in the TaskMetadata struct, we could add a new field for metadata:

type TaskMetadata struct {
    // existing fields...
    Metadata map[string]string `protobuf:"bytes,11,rep,name=metadata,proto3" json:"metadata,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"`
}

This would allow us to pass arbitrary key-value pairs as metadata. The same can be done for workflows.

To ensure backward compatibility, we would need to make sure that any code that reads these protobuf messages can handle the case where the new Metadata field is not present. This can be done by checking if the field is nil before trying to access it:

func (m *TaskMetadata) GetMetadata() map[string]string {
    if m != nil {
        return m.Metadata
    }
    return nil
}

For a smooth rollout, we could first deploy a version of your code that can handle the new Metadata field but does not yet produce it. Once all systems are running the new code, we can start producing messages with the new field.

Please note that this is a high-level overview of the changes that would be needed. The actual implementation might require additional changes depending on the specifics of your system.

I hope this helps! If you have any further questions or need clarification on any points, please don't hesitate to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants